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I. INTRODUCTION 

We begin, as a way of entering our subject, by characterizing a particu- 
lar interpretation of quantum theory which, although not representative of 
the more careful formulations of some writers, is the most common form 
encountered in textbooks and university lectures on the subject. 

A physical system is described completely by a state function iff, 
which is an element of a Hilbert space, and which furthermore gives in- 
formation only concerning the probabilities of the results of various obser- 
vations which can be made on the system. The state function iff is 
thought of as objectively characterizing the physical system, i.e., at all 
times an isolated system is thought of as possessing a state function, in- 
dependently of our state of knowledge of it. On the other hand, iff changes 
in a causal manner so long as the system remains isolated, obeying a dif- 
ferential equation. Thus there are two fundamentally different ways in 
which the state function can change: 1 

Process 1: The discontinuous change brought about by the observa- 
tion of a quantity with eigenstates 1( <£ 2 ,..., in which the state 
iff will be changed to the state <£j with probability \(iff,<f>^)\ 2 . 

Process 2: The continuous, deterministic change of state of the 

(isolated) system with time according to a wave equation Uiff, 
where U is a linear operator. 



We use here the terminology of von Neumann [l7]. 
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The question of the consistency of the scheme arises if one contem- 
plates regarding the observer and his object-system as a single (composite) 
physical system. Indeed, the situation becomes quite paradoxical if we 
allow for the existence of more than one observer. Let us consider the 
case of one observer A, who is performing measurements upon a system S, 
the totality (A + S) in turn forming the object-system for another observer, 
B. 

If we are to deny the possibility of B's use of a quantum mechanical 
description (wave function obeying wave equation) for A + S, then we 
must be supplied with some alternative description for systems which con- 
tain observers (or measuring apparatus). Furthermore, we would have to 
have a criterion for telling precisely what type of systems would have the 
preferred positions of "measuring apparatus" or "observer" and be sub- 
ject to the alternate description. Such a criterion is probably not capable 
of rigorous formulation. 

On the other hand, if we do allow B to give a quantum description to 
A + S, by assigning a state function <f/^ + ^, then, so long as B does not 
interact with A + S, its state changes causally according to Process 2, 
even though A may be performing measurements upon S. From B's point 
of view, nothing resembling Process 1 can occur (there are no discontinui- 
ties), and the question of the validity of A's use of Process 1 is raised. 
That is, apparently either A is incorrect in assuming Process 1, with its 
probabilistic implications, to apply to his measurements, or else B's state 
function, with its purely causal character, is an inadequate description of 
what is happening to A + S. 

To better illustrate the paradoxes which can arise from strict adher- 
ence to this interpretation we consider the following amusing, but extremely 
hypothetical drama. 

Isolated somewhere out in space is a room containing an observer, 

A, who is about to perform a measurement upon a system S. After 

performing his measurement he will record the result in his notebook. 

We assume that he knows the state function of S (perhaps as a result 
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of previous measurement), and that it is not an eigenstate of the mea- 
surement he is about to perform. A, being an orthodox quantum theo- 
rist, then believes that the outcome of his measurement is undetermined 
and that the process is correctly described by Process 1. 

In the meantime, however, there is another observer, B, outside 
the room, who is in possession of the state function of the entire room, 
including S, the measuring apparatus, and A, just prior to the mea- 
surement. B is only interested in what will be found in the notebook 
one week hence, so he computes the state function of the room for one 
week in the future according to Process 2. One week passes, and we 
find B still in possession of the state function of the room, which 
this equally orthodox quantum theorist believes to be a complete de- 
scription of the room and its contents. If B's state function calcula- 
tion tells beforehand exactly what is going to be in the notebook, then 
A is incorrect in his belief about the indeterminacy of the outcome of 
his measurement. We therefore assume that B's state function con- 
tains non-zero amplitudes over several of the notebook entries. 

At this point, B opens the door to the room and looks at the note- 
book (performs his observation). Having observed the notebook entry, 
he turns to A and informs him in a patronizing manner that since his 
(B's) wave function just prior to his entry into the room, which he 
knows to have been a complete description of the room and its contents, 
had non-zero amplitude over other than the present result of the mea- 
surement, the result must have been decided only when B entered the 
room, so that A, his notebook entry, and his memory about what 
occurred one week ago had no independent objective existence until 
the intervention by B. In short, B implies that A owes his present 
objective existence to B's generous nature which compelled him to 
intervene on his behalf. However, to B's consternation, A does not 
react with anything like the respect and gratitude he should exhibit 
towards B, and at the end of a somewhat heated reply, in which A 
conveys in a colorful manner his opinion of B and his beliefs, he 
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rudely punctures B's ego by observing that if B's view is correct, 
then he has no reason to feel complacent, since the whole present 
situation may have no objective existence, but may depend upon the 
future actions of yet another observer. 

It is now clear that the interpretation of quantum mechanics with which 
we began is untenable if we are to consider a universe containing more 
than one observer. We must therefore seek. a suitable modification of this 
scheme, or an entirely different system of interpretation. Several alterna- 
tives which avoid the paradox are: 

Alternative 1: To postulate the existence of only one observer in the 
universe. This is the solipsist position, in which each of us must 
hold the view that he alone is the only valid observer, with the 
rest of the universe and its inhabitants obeying at all times Process 
2 except when under his observation. 

This view is quite consistent, but one must feel uneasy when, for 
example, writing textbooks on quantum mechanics, describing Process 1, 
for the consumption of other persons to whom it does not apply. 

Alternative 2: To limit the applicability of quantum mechanics by 
asserting that the quantum mechanical description fails when 
applied to observers, or to measuring apparatus, or more generally 
to systems approaching macroscopic size. 

If we try to limit the applicability so as to exclude measuring apparatus, 
or in general systems of macroscopic size, we are faced with the difficulty 
of sharply defining the region of validity. For what n might a group of n 
particles be construed as forming a measuring device so that the quantum 
description fails? And to draw the line at human or animal observers, i.e., 
to assume that all mechanical aparata obey the usual laws, but that they 
are somehow not valid for living observers, does violence to the so-called 
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principle of psycho-physical parallelism, and constitutes a view to be 
avoided, if possible. To do justice to this principle we must insist that 
we be able to conceive of mechanical devices (such as servomechanisms), 
obeying natural laws, which we would be willing to call observers. 

Alternative 3: To admit the validity of the state function description, 
but to deny the possibility that B could ever be in possession of 
the state function of A + S. Thus one might argue that a determi- 
nation of the state of A would constitute such a drastic interven- 
tion that A would cease to function as an observer. 

The first objection to this view is that no matter what the state of 
A + S is, there is in principle a complete set of commuting operators for 
which it is an eigenstate, so that, at least, the determination of these 
quantities will not affect the state nor in any way disrupt the operation of 
A. There are no fundamental restrictions in the usual theory about the 
knowability of any state functions, and the introduction of any such re- 
strictions to avoid the paradox must therefore require extra postulates. 

The second objection is that it is not particularly relevant whether or 
not B actually knows the precise state function of A + S. If he merely 
believes that the system is described by a state function, which he does 
not presume to know, then the difficulty still exists. He must then believe 
that this state function changed deterministically, and hence that there 
was nothing probabilistic in A's determination. 



In the words of von Neumann ([l7], p. 418): "...it is a fundamental requirement 
of the scientific viewpoint — the so-called principle of the psycho-physical parallel- 
ism — that it must be possible so to describe the extra-physical process of the sub- 
jective perception as if it were in reality in the physical world — i.e., to assign to 
its parts equivalent physical processes in the objective environment, in ordinary 
space." 



8 



HUGH EVERETT, HI 



Alternative 4: To abandon the position that the state function is a 
complete description of a system. The state function is to be re- 
garded not as a description of a single system, but of an ensemble 
of systems, so that the probabilistic assertions arise naturally 
from the incompleteness of the description. 

It is assumed that the correct complete description, which would pre- 
sumably involve further (hidden) parameters beyond the state function 
alone, would lead to a deterministic theory, from which the probabilistic 
aspects arise as a result of our ignorance of these extra parameters in the 
same manner as in classical statistical mechanics. 

Alternative 5: To assume the universal validity of the quantum de- 
scription, by the complete abandonment of Process 1. The general 
validity of pure wave mechanics, without any statistical assertions, 
is assumed for all physical systems, including observers and mea- 
suring apparata. Observation processes are to be described com- 
pletely by the state function of the composite system which in- 
cludes the observer and his object-system, and which at all times 
obeys the wave equation (Process 2). 

This brief list of alternatives is not meant to be exhaustive, but has 
been presented in the spirit of a preliminary orientation. We have, in fact, 
omitted one of the foremost interpretations of quantum theory, namely the 
position of Niels Bohr. The discussion will be resumed in the final chap- 
ter, when we shall be in a position to give a more adequate appraisal of 
the various alternate interpretations. For the present, however, we shall 
concern ourselves only with the development of Alternative 5. 

It is evident that Alternative 5 is a theory of many advantages. It has 
the virtue of logical simplicity and it is complete in the sense that it is 
applicable to the entire universe. All processes are considered equally 
(there are no "measurement processes" which play any preferred role), 
and the principle of psycho-physical parallelism is fully maintained. Since 
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the universal validity of the state function description is asserted, one 
can regard the state functions themselves as the fundamental entities, 
and one can even consider the state function of the whole universe. In 
this sense this theory can be called the theory of the "universal wave 
function," since all of physics is presumed to follow from this function 
alone. There remains, however, the question whether or not such a theory 
can be put into correspondence with our experience. 

The present thesis is devoted to showing that this concept of a uni- 
versal wave mechanics, together with the necessary correlation machinery 
for its interpretation, forms a logically self consistent description of a 
universe in which several observers are at work. 

We shall be able to introduce into the theory systems which represent 
observers. Such systems can be conceived as automatically functioning 
machines (servomechanisms) possessing recording devices (memory) and 
which are capable of responding to their environment. The behavior of 
these observers shall always be treated within the framework of wave 
mechanics. Furthermore, we shall deduce the probabilistic assertions of 
Process 1 as subjective appearances to such observers, thus placing the 
theory in correspondence with experience. We are then led to the novel 
situation in which the formal theory is objectively continuous and causal, 
while subjectively discontinuous and probabilistic. While this point of 
view thus shall ultimately justify our use of the statistical assertions of 
the orthodox view, it enables us to do so in a logically consistent manner, 
allowing for the existence of other observers. At the same time it gives a 
deeper insight into the meaning of quantized systems, and the role played 
by quantum mechanical correlations. 

In order to bring about this correspondence with experience for the 
pure wave mechanical theory, we shall exploit the correlation between 
subsystems of a composite system which is described by a state function. 
A subsystem of such a composite system does not, in general, possess an 
independent state function. That is, in general a composite system can- 
not be represented by a single pair of subsystem states, but can be repre- 
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sented only by a superposition of such pairs of subsystem states. For 
example, the Schrodinger wave function for a pair of particles, ^(x^Xj), 
cannot always be written in the form tj/ = 0(x 1 >7;(x 2 ), but only in the form 
t(i = a.j0 1 (x 1 )j^(x 2 ). In the latter case, there is no single state for 
i,j 

Particle 1 alone or Particle 2 alone, but only the superposition of such 
cases. 

In fact, to any arbitrary choice of state for one subsystem there will 
correspond a relative state for the other subsystem, which will generally 
be dependent upon the choice of state for the first subsystem, so that the 
state of one subsystem is not independent, but correlated to the state of 
the remaining subsystem. Such correlations between systems arise from 
interaction of the systems, and from our point of view all measurement and 
observation processes are to be regarded simply as interactions between 
observer and object-system which produce strong correlations. 

Let one regard an observer as a subsystem of the composite system: 
observer + object-system. It is then an inescapable consequence that 
after the interaction has taken place there will not, generally, exist a 
single observer state. There will, however, be a superposition of the com- 
posite system states, each element of which contains a definite observer 
state and a definite relative object-system state. Furthermore, as we shall 
see, each of these relative object-system states will be, approximately, 
the eigenstates of the observation corresponding to the value obtained by 
the observer which is described by the same element of the superposition. 
Thus, each element of the resulting superposition describes an observer 
who perceived- a definite and generally different result, and to whom it 
appears that the object-system state has been transformed into the corre- 
sponding eigenstate. In this sense the usual assertions of Process 1 
appear to hold on a subjective level to each observer described by an ele- 
ment of the superposition. We shall also see that correlation plays an 
important role in preserving consistency when several observers are present 
and allowed to interact with one another (to "consult" one another) as 
well as with other object-systems. 
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In order to develop a language for interpreting our pure wave mechan- 
ics for composite systems we shall find it useful to develop quantitative 
definitions for such notions as the "sharpness" or "definiteness" of an 
operator A for a state t/r, and the "degree of correlation" between the 
subsystems of a. composite system or between a pair of operators in the 
subsystems, so that we can use these concepts in an unambiguous manner. 
The mathematical development of these notions will be carried out in the 
next chapter (II) using some concepts borrowed from Information Theory. 
We shall develop there the general definitions of information and correla- 
tion, as well as some of their more important properties. Throughout 
Chapter II we shall use the language of probability theory to facilitate the 
exposition, and because it enables us to introduce in a unified manner a 
number of concepts that will be of later use. We shall nevertheless sub- 
sequently apply the mathematical definitions directly to state functions, 
by replacing probabilities by square amplitudes, without, however, making 
any reference to probability models. 

Having set the stage, so to speak, with Chapter II, we turn to quantum 
mechanics in Chapter III. There we first investigate the quantum forma- 
lism of composite systems, particularly the concept of relative state func- 
tions, and the meaning of the representation of subsystems by non- 
interfering mixtures of states characterized by density matrices. The 
notions of information and correlation are then applied to quantum mechan- 
ics. The final section of this chapter discusses the measurement process, 
which is regarded simply as a correlation-inducing interaction between 
subsystems of a single isolated system. A simple example of such a 
measurement is given and discussed, and some general consequences of 
the superposition principle are considered. 



The theory originated by Claude E. Shannon [19]. 
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This will be followed by an abstract treatment of the problem of 
Observation (Chapter IV). In this chapter we make use only of the super- 
position principle, and general rules by which composite system states 
are formed of subsystem states, in order that our results shall have the 
greatest generality and be applicable to any form of quantum theory for 
which these principles hold. (Elsewhere, when giving examples, we re- 
strict ourselves to the non-relativistic Schrodinger Theory for simplicity.) 
The validity of Process 1 as a subjective phenomenon is deduced, as well 
as the consistency of allowing several observers to interact with one 
another. 

Chapter V supplements the abstract treatment of Chapter IV by discus- 
sing a number of diverse topics from the point of view of the theory of 
pure wave mechanics, including the existence and meaning of macroscopic 
objects in the light of their atomic constitution, amplification processes 
in measurement, questions of reversibility and irreversibility, and approxi- 
mate measurement. 

The final chapter summarizes the situation, and continues the discus- 
sion of alternate interpretations of quantum mechanics. 



II. PROBABILITY, INFORMATION, AND CORRELATION 



The present chapter is devoted to the mathematical development of the 
concepts of information and correlation. As mentioned in the introduction 
we shall use the language of probability theory throughout this chapter to 
facilitate the exposition, although we shall apply the mathematical defini- 
tions and formulas in later chapters without reference to probability models. 
We shall develop our definitions and theorems in full generality, for proba- 
bility distributions over arbitrary sets, rather than merely for distributions 
over real numbers, with which we are mainly interested at present. We 
take this course because it is as easy as the restricted development, and 
because it gives a better insight into the subject. 

The first three sections develop definitions and properties of informa- 
tion and correlation for probability distributions over finite sets only. In 
section four the definition of correlation is extended to distributions over 
arbitrary sets, and the general invariance of the correlation is proved. 
Section five then generalizes the definition of information to distributions 
over arbitrary sets. Finally, as illustrative examples, sections seven and 
eight give brief applications to stochastic processes and classical mechan- 
ics, respectively. 

§1. Finite joint distributions 

We assume that we have a collection of finite sets, f£,?4,...,Z, whose 
elements are denoted by e%, y- €*&,..., z^e %, etc., and that we have 
a joint probability distribution, P = P(x i ,Vj,...,zj { ), defined on the carte- 
sian product of the sets, which represents the probability of the combined 
event Xj.yj,..., and z^. We then denote by X,Y,...,Z the random varia- 
bles whose values are the elements of the sets with probabili- 
ties given by P. 

13 
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For any subset Y,...,Z, of a set of random variables W,...,X, Y,...,Z, 
with joint probability distribution P(wj,...,Xj,y k ,...,Z£), the marginal dis- 
tribution, P(y k ,...,Z£), is defined to be: 

(1.1) P(y k ,...,Z£) = 2 P(w i ,...,Xj,y lc ,...,Z£) , 

i,...,j 

which represents the probability of the joint occurrence of y^,...,zg, with 

no restrictions upon the remaining variables. 

For any subset Y,...,Z of a set of random variables the conditional 

distribution, conditioned upon the values W = W:,...,X = x, for any re- 

W:,..:,x. J 

maining subset W,...,X, and denoted by P J (yk>-..,Z£)> is defined 

t0be;1 T3/ \ 

V/:,...,X: P(Wj,...,X.,y k ,...,Z£) 

(i - 2) p 1 ^ *i> - pcw,:...^) < 

which represents the probability of the joint event Y = y k ,...,Z = zg, con- 
ditioned by the fact that W,...,X are known to have taken the values 

Wj Xj, respectively. 

For any numerical valued function F(y k ,...,Z£), defined on the ele- 
ments of the cartesian product of V, the expectation, denoted by 
Exp [F], is defined to be: 

(1.3) Exp [F] = 2 P(y k ,...,Z£) F(y k ,...,Z£) . 

k,...£ 

We note that if P(y k ,...,Z£) is a marginal distribution of some larger dis- 
tribution P(w i ,...,Xj,y k ,...,Z£) then 

(1.4) Exp [F] = 2(2 p < w i— ' x j»yk'— ' z £>) F (Vk z £> 

k IV i / 

2 p ( w i>— . x j.y k '— ' z £) F (yk'— » z £) ' 

i j.k I 



1 We regard it as undefined if P(W;,...,Xj) = 0. In this case P(w.,...,Xj, 



yj c ,...,z£) is necessarily zero also. 
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so that if we wish to compute Exp [F] with respect to some joint distri- 
bution it suffices to use any marginal distribution of the original distribu- 
tion which contains at least those variables which occur in F. 

We shall also occasionally be interested in conditional expectations, 
which we define as: 

(1.5) Exp J[F] = 2 P J(y k z g )F(y k ,...,z £ ) , 

k I 

and we note the following easily verified rules for expectations: 

(1.6) Exp [Exp [F]] = Exp [F] , 

(1.7) Exp U i'-' V i [Exp U i V i' W l"-' **[F]] = Exp"*'"^ [F] . 

(1.8) Exp [F+G] = Exp [F] + Exp [G] . 

We should like finally to comment upon the notion of independence. 
Two random variables X and Y with joint distribution P(xj, yj) will be 
said to be independent if and only if P(xj,yj) is equal to P(xj)P(yj) 
for all i, j. Similarly, the groups of random variables (U...V), (W...X),..., 
(Y...Z) will be called mutually independent groups if and only if 
P(uj,...,Vj, w k ,...,X£,...,y m ,...,z n ) is always equal to P(uj,...,Vj) 
P(w k ,...,X£)...P(y m z n ). 

Independence means that the random variables take on values which 
are not influenced by the values of other variables with respect to which 
they are independent. That is, the conditional distribution of one of two 
independent variables, Y, conditioned upon the value Xj for the other, 
is independent of Xj, so that knowledge about one variable tells nothing 
of the other. 

§2. Information for finite distributions 

Suppose that we have a single random variable X, with distribution 
P(xp. We then define 2 a number, Ijj, called the information of X, to be: 



This definition corresponds to the negative of the entropy of a probability 
distribution as defined by Shannon [l9]. 
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(2.1) 



I x = 2 p (x i )lnP(x i ) = Exp [In P(xj)] , 



which is a function of the probabilities alone and not of any possible 



The information is essentially a measure of the sharpness of a proba- 
bility distribution, that is, an inverse measure of its "spread." In this 
respect information plays a role similar to that of variance. However, it 
has a number of properties which make it a superior measure of the 
"sharpness" than the variance, not the least of which is the fact that it 
can be defined for distributions over arbitrary sets, while variance is de- 
fined only for distributions over real numbers. 

Any change in the distribution P(xj) which "levels out" the proba- 
bilities decreases the information. It has the value zero for "perfectly 
sharp" distributions, in which the probability is one for one of the Xj and 
zero for all others, and ranges downward to —In n for distributions over 
n elements which are equal over all of the x^. The fact that the informa- 
tion is nonpositive is no liability, since we are seldom interested in the 
absolute information of a distribution, but only in differences. 

We can generalize (2.1) to obtain the formula for the information of a 
group of random variables X, Y,...,Z, with joint distribution P(x i ,yj,...,Zj { ), 
which we denote by Ijjy Z' 



numerical values of the Xj's themselves. 



(2.2) 




i,j,...,k 



Exp [In P(x i ,yj,...,z k )] , 



^ A good discussion of information is to be found in Shannon [l9], or Woodward 
[2l]. Note, however, that in the theory of communication one defines the informa- 
tion of a state Xj, which has a priori probability Pj, to be —In P.. We prefer, 
however, to regard information as a property of the distribution itself. 
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which follows immediately from our previous definition, since the group of 
random variables X,Y,...,Z may be regarded as a single random variable 
W which takes its values in the cartesian product Xx^ x "' x Z. 
Finally, we define a conditional information, ^ > *° ^ e: 

(2.3) ix?;;.z Wn = 2 pVm Wn (x i .y j -.. Zk )inp Vm "" ,Wn (x i ,y j ,.. I z k ) 

i,j,...,k 

= Exp Vm '"" Wn [lnP Vm *-' Wn (x i ,y j ,...,z k )] , 

a quantity which measures our information about X, Y,...,Z given that we 
know that V...W have taken the particular values v m ,...,w n . 

For independent random variables X, Y Z, the following relation- 
ship is easily proved: 

(2.4) I xy z = 1 X + l Y + --- +l Z ( X > Y >--> Z independent), 

so that the information of XY...Z is the sum of the individual quantities 
of information, which is in accord with our intuitive feeling that if we are 
given information about unrelated events, our total knowledge is the sum 
of the separate amounts of information. We shall generalize this definition 
later, in §5. 

§3. Correlation for finite distributions 

Suppose that we have a pair of random variables, X and Y, with 
joint distribution P(xj,yj). If we say that X and Y are correlated, 
what we intuitively mean is that one learns something about one variable 
when he is told the value of the other. Let us focus our attention upon 
the variable X. If we are not informed of the value of Y, then our infor- 
mation concerning X, I^. is calculated from the marginal distribution 
P(xj). However, if we are now told that Y has the value yj, then our 
information about X changes to the information of the conditional distri- 

y. y . 

bution P J (x ), I J . According to what we have said, we wish the degree 
^ X 

correlation to measure how much we learn about X by being informed of 
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y . 

Y's value. However, since the change of information, — I x , may de- 
pend upon the particular value, yj, of Y which we are told, the natural 
thing to do to arrive at a single number to measure the strength of correla- 
tion is to consider the expected change in information about X, given 
that we are to be told the value of Y. This quantity we call the correla- 
tion information, or for brevity, the correlation, of X and Y, and denote 
it by 1X,Y!. Thus: 

(3.1) IX, Yi = Exp [l£ - I x ] = Exp jr£] - I x . 

Expanding the quantity Exp using (2.3) and the rules for expecta- 

tions (1.6)- (1.8) we find: 

Exp [ij] = Exp [Exp y j [In P^Xj)]] 

(3.2) = Exp [to P p ( ^i } j = Exp [In P( Xi , yj )] - Exp [In P( yj )] 

= I XY - I Y . 
and combining with (3.1) we have: 

(3.3) lX,Yi = I xy - I x - I y . 

Thus the correlation is symmetric between X and Y, and hence also 
equal to the expected change of information about Y given that we will 
be told the value of X. Furthermore, according to (3.3) the correlation 
corresponds precisely to the amount of "missing information" if we 
possess only the marginal distributions, i.e., the loss of information if we 
choose to regard the variables as independent. 



THEOREM 1. |X, Yl = if and only if X and Y are independent, and 
is otherwise strictly positive. (Proof in Appendix I.) 
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In this respect the correlation so defined is superior to the usual cor- 
relation coefficients of statistics, such as covariance, etc., which can be 
zero even when the variables are not independent, and which can assume 
both positive and negative values. An inverse correlation is, after all, 
quite as useful as a direct correlation. Furthermore, it has the great ad- 
vantage of depending upon the probabilities alone, and not upon any 
numerical values of xj and yj, so that it is defined for distributions 
over sets whose elements are of an arbitrary nature, and not only for dis- 
tributions over numerical properties. For example, we might have a joint 
probability distribution for the political party and religious affiliation of 
individuals. Correlation and information are defined for such distributions, 
although they possess nothing like covariance or variance. 

We can generalize (3.3) to define a group correlation for the groups of 
random variables (U...V), (W...X),..., (Y...Z), denoted by {U...V, W...X, 

Y...Z} (where the groups are separated by commas), to be: 



(3.4) 



iu...v,w...x Y...Z! = Iu... V W...X...Y...Z 



I U...V~ I W...X~"-~ I Y...Z 



again measuring the information deficiency for the group marginals. Theo- 
rem 1 is also satisfied by the group correlation, so that it is zero if and 
only if the groups are mutually independent. We can, of course, also de- 
fine conditional correlations in the obvious manner, denoting these quanti- 
ties by appending the conditional values as superscripts, as before. 

We conclude this section by listing some useful formulas and inequali- 
ties which are easily proved: 

P(Uj,Vj,...,W k ) 



(3.5) 



{U,V,...,W} = Exp 



In 



P(u i )P(v j )...P(w k )J ' 



(3.6) {U,V,...,Wj Xi '" y i 



Ex P x *- y i 



In 



P Xi "' y j(u k ,v 1( ...,w m ) 



p Xi "' y j(u k )p Xi *" y i(v 1 )...p Xi, " y i(w m ) > _ 

(conditional correlation) , 
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|...,U,V,...I = |...,UV,...i + |U,Vi , 

(3.7) 

i...,U,V,...,W,...! = i...,UV...W,...| + iU,V,...,W! (comma removal) 

(3.8) !...,U,VW,...l - {...,UV,W,...i = {U,V! - {V,W| (commutator) , 

(3.9) |Xi = (definition of bracket with no commas) , 

(3.10) i...,XXV,...! = {...,XV,...i 
(removal of repeated variable within a group) , 

(3.11) |...,UV,VW,...} = j...,UV,W,...l - {V,W| - I y 
(removal of repeated variable in separate groups) , 

(3.12) iX,Xi = - I x (self correlation) , 



(3.13) 



lu.vw.xi J = iu,v,xl J , 

...W-... ...w ; ... 

lU.W.X} J = fu,x| J 
(removal of conditioned variables) , 



(3.14) iXY.Zl > |X,Z| , 

(3.15) 1XY.Z! ^ iX.Zl + \Y,Z] - fX,Y| , 

(3.16) iX,Y,Z! > iX,Yl + |X,Z! . 

Note that in the above formulas any random variable W may be re- 
placed by any group XY...Z and the relation holds true, since the set 
XY...Z may be regarded as the single random variable W, which takes 
its values in the cartesian product !t x ?J x ••• x 2>. 

§4. Generalization and further properties of correlation 

Until now we have been concerned only with finite probability distri- 
butions, for which we have defined information and correlation. We shall 
now generalize the definition of correlation so as to be applicable to joint 
probability distributions over arbitrary sets of unrestricted cardinality. 
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We first consider the effects of refinement of a finite distribution. For 
example, we may discover that the event Xj is actually the disjunction 
of several exclusive events 3c j,...,"?- 1 , so that Xj occurs if any one of 
the xjf occurs, i.e., the single event Xj results from failing to distin- 
guish between the xj*. The probability distribution which distinguishes 
between the "xj* will be called a refinement of the distribution which does 
not. In general, we shall say that a distribution P'= PXxj 1 ,...,^) is a 
refinement of P = P(xj,...,yj) if 

(4.1) P(x i( ..., yj ) = ^ P'(xf ,...,y[) (all i,...,j) . 

\i...v 

We now state an important theorem concerning the behavior of correla- 
tion under a refinement of a joint probability distributions: 

THEOREM 2. P' is a refinement of P iX,...,Y!' ^ iX,...,Y| so that 
correlations never decrease upon refinement of a distribution. (Proof in 
Appendix I, §3.) 

As an example, suppose that we have a continuous probability density 
P(x,y). By division of the axes into a finite number of intervals, Xj, jTj, 
we arrive at a finite joint distribution Pjj, by integration of P(x,y)over 
the rectangle whose sides are the intervals Xj and y"j, and which repre- 
sents the probability that X e Xj and Y t y"j. If we now subdivide the 
intervals, the new distribution P' will be a refinement of P, and by 
Theorem 2 the correlation iX,Yj computed from P' will never be less 
than that computed from P. Theorem 2 is seen to be simply the mathemati- 
cal verification of the intuitive notion that closer analysis of a situation 
in which quantities X and Y are dependent can never lessen the knowl- 
edge about Y which can be obtained from X. 

This theorem allows us to give a general definition of correlation 
which will apply to joint distributions over completely arbitrary sets, i.e., 
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for any probability measure on an arbitrary product space, in the follow- 
ing manner: 

Assume that we have a collection of arbitrary sets X, V %, and a 

probability measure, Mp$ x< y* •••><%), on their cartesian product. Let 

be any finite partition of X into subsets X?, H into subsets 
VP,..., and Z into subsets %£, such that the sets Ifx^x -..x%£ 
of the cartesian product are measurable in the probability measure Mp. 
Another partition 9 V is a refinement of £P**, 9 V C 9^, if 9 V results 
from y by further subdivision of the subsets X^, c i^,...,Z^. Each par- 
tition 9^ results in a finite probability distribution, for which the corre- 
(£>(i 

lation, (X,Y,...,Zl , is always defined through (3.3). Furthermore a 
refinement of a partition leads to a refinement of the probability distribu- 
tion, so that by Theorem 2: 

(4.8) 9 V S 9V ==> IX, Y zf V > iX,Y,...,Z\ 911 

Now the set of all partitions is partially ordered under the refinement 
relation. Moreover, because for any pair of partitions 9, 9 there is 
always a third partition 9 which is a refinement of both (common lower 
bound), the set of all partitions forms a directed set. 5 For a function, f , 
on a directed set, S, one defines a directed set limit, lim f,: 

DEFINITION, lim f exists and is equal to a for every £ > there 
exists an a ( S such that |f(/3)-a| < e for every /S e S for which /S ^ a. 

It is easily seen from the directed set property of common lower bounds 
that if this limit exists it is necessarily unique. 



A measure is a non-negative, countably additive set function, defined on some 
subsets of a given set. It is a probability measure if the measure of the entire set 
is unity. See Halmos [12]. 



See Kelley [is], p. 65. 
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By (4.8) the correlation {X,Y,...,Zl is a monotone function on the 
directed set of all partitions. Consequently the directed set limit, which 
we shall take as the basic definition of the correlation (X,Y,...,Z|, 
always exists. (It may be infinite, but it is in every case well defined.) 
Thus: 

DEFINITION. |X,Y,...,Zl = lim \X,Y,...,Z\^ , 

and we have succeeded in our endeavor to give a completely general defi- 
nition of correlation, applicable to all types of distributions. 

It is an immediate consequence of (4.8) that this directed set limit is 

9 

the supremum of iX,Y,...,Zr , so that: 

(4.9) |X,Y Z| = sup fX,Y,...,Z} 9> , 

9 

which we could equally well have taken as the definition. 

Due to the fact that the correlation is defined as a limit for discrete 
distributions, Theorem 1 and all of the relations (3.7) to (3.15), which 
contain only correlation brackets, remain true for arbitrary distributions. 
Only (3.11) and (3.12), which contain information terms, cannot be extended. 

We can now prove an important theorem about correlation which con- 
cerns its invariant nature. Let be arbitrary sets with proba- 
bility measure Mp on their cartesian product. Let f be any one-one 
mapping of % onto a set 11, g a one-one map of ^ onto 0,..., and h 
a map of 2> onto H). Then a joint probability distribution over 
%xyx...xZ leads also to one over 1]xQx...x[9 where the probability 
Mp induced on the product Ux0x...x© is simply the measure which 
assigns to each subset of 1ix0x...xffl the measure which is the measure 
of its image set in 3tx?Jx ...xg for the original measure M p . (We have 
simply transformed to a new set of random variables: U = f(X), V = g(Y), 

W = h(Z).) Consider any partition 9 of 1,^,...,% into the subsets 

iSCj}, {^1 |Z k j with probability distribution k = Mp^^x,...^). 

Then there is a corresponding partition 9' of 11,0,...,® into the image 
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sets of the sets of 5\{lU,iOi!,...,{t8. \, where 1i 5 = f(3C ? ), 0, = g$. ),..., 

1 J K 1 1 J J 

®k = n ^k^ ^ ut ^ e probability distribution for 9' is the same as that 
for 9, since Py k = M'pflJjXO.x ...xfflp = Mp^x^.x ...x% k ) = 

P ij...k' so that: 

(4.10) \X,Y,...,Z\ 9 = iU,V,...,W! 5> 

Due to the correspondence between the 9's and 9 's we have that: 

(4.11) sup (X,Y,...,Z| 9> = sup |U,V,...,Wi 3> , 

9 9' 

and by virtue of (4.9) we have proved the following theorem: 

THEOREM 3. |X,Y,...,Z! = |U,V W|, where are any one- 

one images ol %, C U,...,%, respectively. In other notation: (X,Y,...,Z! = 
if(X), g(Y),..., h(Z)| for all one-one functions f, g,..., h. 



This means that changing variables to functionally related variables 
preserves the correlation. Again this is plausible on intuitive grounds, 
since a knowledge of f(x) is just as good as knowledge of x, provided 
that f is one-one. 

A special consequence of Theorem 3 is that for any continuous proba- 
bility density P(x, y) over real numbers the correlation between f(x) 
and g(y) is the same as between x and y, where f and g are any 
real valued one-one functions. As an example consider a probability dis- 
tribution for the position of two particles, so that the random variables 
are the position coordinates. Theorem 3 then assures us that the position 
correlation is independent of the coordinate system, even if different 
coordinate systems are used for each particle! Also for a joint distribu- 
tion for a pair of events in space-time the correlation is invariant to arbi- 
trary space-time coordinate transformations, again even allowing different 
transformations for the coordinates of each event. 
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These examples illustrate clearly the intrinsic nature of the correla- 
tion of various groups for joint probability distributions, which is implied 
by its invariance against arbitrary (one-one) transformations of the random 
variables. These correlation quantities are thus fundamental properties 
of probability distributions. A correlation is an absolute rather than rela- 
tive quantity, in the sense that the correlation between (numerical valued) 
random variables is completely independent of the scale of measurement 
chosen for the variables. 

§5. Information for general distributions 

Although we now have a definition of correlation applicable to all 
probability distributions, we have not yet extended the definition of infor- 
mation past finite distributions. In order to make this extension we first 
generalize the definition that we gave for discrete distributions to a defi- 
nition of relative information for a random variable, relative to a given 
underlying measure, called the information measure, on the values of the 
random variable. 

If we assign a measure to the set of values of a random variable, X, 
which is simply the assignment of a positive number aj to each value Xj 
in the finite case, we define the information of a probability distribution 
P(xp relative to this information measure to be: 



(5.1) 



Ix^P^pin-^Exp 



In 



P(Xj) 



If we have a joint distribution of random variables X,Y,...,Z, with 
information measures {a^}, ibji,..., ic^i on their values, then we define 
the total information relative to these measures to be: 

p ( x i' v j z k> 



(5.2) 



I 



XY...Z 



2 P(x i' y j Z k )ln 



ij...k 



a i b j- c l< 



Exp 



In 



P(x i ,y j ,-,z k )" 
a i b j ...c k j 
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so that the information measure on the cartesian product set is always 
taken to be the product measure of the individual information measures. 

We shall now alter our previous position slightly and consider informa- 
tion as always being defined relative to some information measure, so 
that our previous definition of information is to be regarded as the informa- 
tion relative to the measure for which all the aj's, bj's,... and c^'s are 
taken to be unity, which we shall henceforth call the uniform measure. 

Let us now compute the correlation |X,Y,...,Zj' by (3.4) using the 
relative information: 

(5.3) tX,Y,...,Zi'= IxY.-.Z- 1 ^-^--- 1 ^ 

P(x i ,y j ,...,z k )~| T p (x )"| 

= Exp In y — -Exp In L 

L a i b j- c k J L a i J 

[" P(x i ,y j ,...,z k ) "1 

- Exp L in P(x i )P( yj )...p( Zk )j = lx ' Y z! • 

so that the correlation for discrete distributions, as defined by (3.4), is 
independent of the choice of information measure, and the correlation re- 
mains an absolute, not relative quantity. It can, however, be computed 
from the information relative to any information measure through (3.4). 

If we consider refinements, of our distributions, as before, and realize 
that such a refinement is also a refinement of the information measure, 
then we can prove a relation analogous to Theorem 2: 

THEOREM 4. The information of a distribution relative to a given informa- 
tion measure never decreases under refinement. (Proof in Appendix I.) 



Therefore, just as for correlation, we can define the information of a 
probability measure Mp on the cartesian product of arbitrary sets 
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relative to the information measures n-^,fiY'--->V L Z' on the 
individual sets, by considering finite partitions 9 into subsets [Xj, 
|Bjl,....fZ k }, for which we take as the definition of the information: 

Then Ijjy z * s ' as was iX,Y,...,Z| , a monotone function upon the 
directed set of partitions (by Theorem 4), and as before we take the 
directed set limit for our definition: 

< 5 - 5 > ! XY...Z = lim ! XY...Z = SU P ! XY...Z 

which is then the information relative to the information measures 
f*X' f i Y'" ,, ' t Z- 

Now, for functions f, g on a directed set the existence of lim f and 
lim g is a sufficient condition for the existence of lim(f+g), which is 
then lim f + lim g, provided that this is not indeterminate. Therefore: 

Theorem 5. |X,...,Yl = lim {X,...,Yl^ = lim [lx...Y -I X _ — _i y] = 
*X Y — *X — "• — *Y ' wnere tne information is taken relative to any in- 
formation measure for which the expression is not indeterminate. It is 
sufficient for the validity of the above expression that the basic measures 
fi-^,...,HY be such that none of the marginal informations I^...Iy shall 
be positively infinite. 

The latter statement holds since, because of the general relation 

*X y = *X + " -+ *Y' the determinateness of the expression is guaranteed 

so long as all of the Ijj,...,Iy are < +«> . 

Henceforth, unless otherwise noted, we shall understand that informa- 
tion is to be computed with respect to the uniform measure for discrete 
distributions, and Lebesgue measure for continuous distributions over real 
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(5.6) I XY ...Z 



numbers. In case of a mixed distribution, with a continuous density 
P(x,y,...,z) plus discrete "lumps" P'(xj,Vj,...,z k ), we shall understand 
the information measure to be the uniform measure over the discrete range, 
and Lebesgue measure over the continuous range. These conventions 
then lead us to the expressions: 

P(x i ,yj,...,z k )ln P(x i ,Vj,...,z k ) I (discrete) 
ij...k ' 

J P(x,y,...,z)ln P(x,y,...,z)dxdy...dz|(cont.) 

^ P'(x i ,...,z k )lnP(x i ,...,z k ) ^ 
L " k > (mixed) 

J P(x,...,z)ln P(x,...,z)dx...dz) 
(unless otherwise noted) 

The mixed case occurs often in quantum mechanics, for quantities 
which have both a discrete and continuous spectrum. 

§6. Example: Information decay in stochastic processes 

As an example illustrating the usefulness of the concept of relative 
information we shall consider briefly stochastic processes. 6 Suppose that 

n 

we have a stationary Markov process with a finite number of states Sj, 
and that the process occurs at discrete (integral) times 1,2,.. .,n,..., at 
which times the transition probability from the state Sj to the state Sj 
is Tjj. The probabilities Tjj then form what is called a stochastic 



6 See Feller [lo], or Doob [6]. 
7 

A Markov process is a stochastic process whose future development depends 
only upon its present state, and not on its past history. 
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matrix, i.e., the elements are between and 1, and ^ Tjj = 1 for all 

i 

i. If at any time k the probability distribution over the states is IpJ 5 } 
then at the next time the probabilities will be P k+1 = ^ Pj'Tjj- 

In the special case where the matrix is doubly-stochastic, which 
means that ^jTjj, as well as ^jTjj, equals unity, and which amounts 

to a principle of detailed balancing holding, it is known that the entropy 
of a probability distribution over the states, defined as H = — ^ jPj In Pj, 

is a monotone increasing function of the time. This entropy is, however, 
simply the negative of the information relative to the uniform measure. 

One can extend this result to more general stochastic processes only 
if one uses the more general definition of relative information. For an 
arbitrary stationary process the choice of an information measure which is 
stationary, i.e., for which 

(6.1) a j = Si a i T ij (allj) 

leads to the desired result. In this case the relative information, 

(6.2) I = 2 i P i ln ' 

is a monotone decreasing function of time and constitutes a suitable 
basis for the definition of the entropy H = —I. Note that this definition 
leads to the previous result for doubly-stochastic processes, since the 
uniform measure, aj = 1 (all i), is obviously stationary in this case. 

One can furthermore drop the requirement that the stochastic process 
be stationary, and even allow that there are completely different sets of 
states, |S n !, at each time n, so that the process is now given by a se- 
quence of matrices T-j representing the transition probability at time n 
from state S" to state S n+1 . In this case probability distributions 
change according to: 
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(6.3) P? +1 = ^iPfTfj • 

If we then choose any time-dependent information measure which satisfies 
the relations: 

(6.4) a? +1 = 2 a i T fj < al1 j' n > • 

then the information of a probability distribution is again monotone de- 
creasing with time. (Proof in Appendix I.) 

All of these results are easily extended to the continuous case, and 
we see that the concept of relative information allows us to define entropy 
for quite general stochastic processes. 

§7. Example: Conservation of information in classical mechanics 

As a second illustrative example we consider briefly the classical 
mechanics of a group of particles. The system at any instant is repre- 
sented by a point, (x 1 ,y* .z 1 ,p*,p*,p* ,...,x n ,y n ,z n ,p£,Py,p n ), in the phase 
space of all position and momentum coordinates. The natural motion of 
the system then carries each point into another, defining a continuous 
transformation of the phase space into itself. According to Liouville's 
theorem the measure of a set of points of the phase space is invariant 
under this transformation. 8 This invariance of measure implies that if we 
begin with a probability distribution over the phase space, rather than a 
single point, the total information 

(7-D 'total = l X Ylzlp i pl y P l " X n Y n Z n P n P n P n , 

which is the information of the joint distribution for all positions and 
momenta, remains constant in time. 



See Khinchin [16], p. 15. 
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In order to see that the total information is conserved, consider any 
partition 9 of the phase space at one time, t Q , with its information 
relative to the phase space measure, I (t Q ). At a later time tj a parti- 
tion 9', into the image sets of 9 under the mapping of the space into 
itself, is induced, for which the probabilities for the sets of 9' are the 
same as those of the corresponding sets of 9, and furthermore for which 
the measures are the same, by Liouville's theorem. Thus corresponding 
to each partition 9 at time t Q with information I (t Q ), there is a parti- 
tion 9' at time tj with information I^(tj), which is the same: 

(7.2) I 3 ^) = I y (t ) . 

Due to the correspondence of the 9's and 9 's the supremums of each 
over all partitions must be equal, and by (5.5) we have proved that 

(7-3) = ' 

and the total information is conserved. 

Now it is known that the individual (marginal) position and momentum 
distributions tend to decay, except for rare fluctuations, into the uniform 
and Maxwellian distributions respectively, for which the classical entropy 
is a maximum. This entropy is, however, except for the factor of Boltz- 
man's constant, simply the negative of the marginal information 

< 7 - 4 > Marginal = % + \ + \ + - + X P n + *P n + *P n • 

which thus tends towards a minimum. But this decay of marginal informa- 
tion is exactly compensated by an increase of the total correlation informa- 
tion 

( 7 - 5 > i total * = ^otal " Marginal ' 

since the total information remains constant. Therefore, if one were to 
define the total entropy to be the negative of the total information, one 
could replace the usual second law of thermodynamics by a law of 
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conservation of total entropy, where the increase in the standard (marginal) 
entropy is exactly compensated by a (negative) correlation entropy. The 
usual second law then results simply from our renunciation of all correla- 
tion knowledge (stosszahlansatz), and not from any intrinsic behavior of 
classical systems. The situation for classical mechanics is thus in sharp 
contrast to that of stochastic processes, which are intrinsically irreversible. 



III. QUANTUM MECHANICS 



Having mathematically formulated the ideas of information and correla- 
tion for probability distributions, we turn to the field of quantum mechanics. 
In this chapter we assume that the states of physical systems are repre- 
sented by points in a Hilbert space, and that the time dependence of the 
state of an isolated system is governed by a linear wave equation. 

It is well known that state functions lead to distributions over eigen- 
values of Hermitian operators (square amplitudes of the expansion coeffi- 
cients of the state in terms of the basis consisting of eigenfunctions of 
the operator) which have the mathematical properties of probability distri- 
butions (non-negative and normalized). The standard interpretation of 
quantum mechanics regards these distributions as actually giving the 
probabilities that the various eigenvalues of the operator will be observed, 
when a measurement represented by the operator is performed. 

A feature of great importance to our interpretation is the fact that a 
state function of a composite system leads to joint distributions over sub- 
system quantities, rather than independent subsystem distributions, i.e., 
the quantities in different subsystems may be correlated with one another. 
The first section of this chapter is accordingly devoted to the development 
of the formalism of composite systems, and the connection of composite 
system states and their derived joint distributions with the various possible 
subsystem conditional and marginal distributions. We shall see that there 
exist relative state {unctions which correctly give the conditional distri- 
butions for all subsystem operators, while marginal distributions can not 
generally be represented by state functions, but only by density matrices. 

In Section 2 the concepts of information and correlation, developed 
in the preceding chapter, are applied to quantum mechanics, by defining 

33 
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information and correlation for operators on systems with prescribed 
states. It is also shown that for composite systems there exists a quantity 
which can be thought of as the fundamental correlation between subsys- 
tems, and a closely related canonical representation of the composite sys- 
tem state. In addition, a stronger form of the uncertainty principle, phrased 
in information language, is indicated. 

The third section takes up the question of measurement in quantum 
mechanics, viewed as a correlation producing interaction between physical 
systems. A simple example of such a measurement is given and discussed. 
Finally some general consequences of the superposition principle are con- 
sidered. 

It is convenient at this point to introduce some notational conventions. 
We shall be concerned with points ^ in a Hilbert space K, with scalar 
product (^ 1( ^ 2 ). A state is a point t{r for which (ifr,tfr) = l. For any 
linear operator A we define a functional, <k>\f/, called the expectation 
of A for tlf, to be: 

<A>i/f = OA.A^) . 

A class of operators of particular interest is the class of projection opera- 
tors. The operator [<£], called the projection on tf>, is defined through: 

For a complete orthonormal set and a state t// we define a 

square-amplitude distribution, Pj, called the distribution of iff over 
{^jl through: 

Pi = U- V n 2 = <[0iW . 

In the probabilistic interpretation this distribution represents the proba- 
bility distribution over the results of a measurement with eigenstates <£j, 
performed upon a system in the state iff. (Hereafter when referring to the 
probabilistic interpretation we shall say briefly "the probability that the 
system will be found in <£j", rather than the more cumbersome phrase 
"the probability that the measurement of a quantity B, with eigenfunc- 
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tions {<£jh shall yield the eigenvalue corresponding to <j> v " which is 
meant.) 

For two Hilbert spaces Hj and H 2 , we form the direct product Hil- 
bert space K 3 = K l ® K 2 (tensor product) which is taken to be the space 
of all possible 1 sums of formal products of points of and K 2 , i.e., 
the elements of K 3 are those of the form ^ J a i f[ T li where f j e Kj and 

7j- e K 2 . The scalar product in K 3 is taken to be ( ^aj^i 7 ?^ ^f^j^j T ?j) = 

* j 
^ a i^j(^i>^j)(Vi> r lj)- It is then easily seen that if {£J and {rj^ Form 

ij 

complete orthonormal sets in Hj and K 2 respectively, then the set of 
all formal products f£j>7j} is a complete orthonormal set in K 3 . For any 
pair of operators A, B, in Kj and H 2 there corresponds an operator 
C = A®B, the direct product of A and B, in H 3 , which can be defined 
by its effect on the elements fjr/j of K 3 : 

Cfji/j = A^Bfjijj = (Afi)(B,.) . 

§1. Composite systems 

It is well known that if the states of a pair of systems Sj and S 2 , 
are represented by points in Hilbert spaces Hj and K 2 respectively, 
then the states of the composite system S = Sj + S 2 (the two systems 
S l and S 2 regarded as a single system S) are represented correctly by 
points of the direct product Hj®K 2 . This fact has far reaching conse- 
quences which we wish to investigate in some detail. Thus if j! is a 
complete orthonormal set for K lf and |tj.} for K 2 , the general state of 
S = Sj + S 2 has the form: 

(1.1) * S ~2*iM (2^-iJ- 1 )' 

ij x ij ' 



1 More rigorously, one_considers only finite sums, then completes the resulting 
space to arrive at 
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In this case we shall call = a- a- the joint square-amplitude distri- 
bution of ifr^ over ifj! and \-q-\. In the standard probabilistic interpre- 
tation a*jajj represents the joint probability that S l will be found in 

the state £j and S 2 will be found in the state 7jj. Following the proba- 
ta 

bilistic model we now derive some distributions from the state \jj . Let 
A be a Hermitian operator in with eigenfunctions 4>i an ^ eigen- 
values and B an operator in S„ with eigenfunctions 6- and eigen- 

S 

values fiy Then the joint distribution of xft over i<£j! and !<£ji, P^, 
is: 

(1.2) Pjj = P(0j and 6$ = lO^,^)! 2 . 

The marginal distributions, of iff over {^jl and of i[t over \<f>p, 

are: 

(1.3) p. = p(0.) = 2 P u = S ^i d y^ 2 • 

j j 

p j = p «v = 2 p ij -2 i^i^'^i 2 - 

i i 

and the conditional distributions P- and Pj are: 

P.. 

(1.4) Pj = P(0j conditioned on <f>-) = , 

P.. 

p| = P(0j conditioned on 0j) = . 

We now define the conditional expectation of an operator A on S. , 
conditioned on 6- in S,, denoted by Exp 1 [A], to be: 

(1.5) Exp^J [A] = ^ *i P i = d/Pj) % P ij\ 

i i 

2 



i 

= d/Pj)2 K^i^j.^lVi.A^) , 
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and we define the marginal expectation of A on S 1 to be: 

(1.6) Exp [A] = 2 P i X i = 2 X i P ij = 2 K0 i 6> j ,^ S )| 2 (0 i , A0j) 

i ij ij 

We shall now introduce projection operators to get more convenient 

forms of the conditional and marginal expectations, which will also exhibit 

more clearly the degree of dependence of these quantities upon the chosen 

basis l<^0:i- Let the operators [<£j] and [(£•] be the projections on 

d>- in S, and cf>- in S„ respectively, and let I and I be the identi- 

S 

ty operators in and S 2 - Then, making use of the identity iff = 
(<£j(?j,i/> S )0j0j for any complete orthonormal set i^flj!, we have: 
ij 

(1.7) <[0 i ][^ j ]>^ S = (^.foilWjty 8 ) = 

\k£ mn / 

= 2 ^k%^>X^XmVim S jn 
k£mn 

C 

so that the joint distribution is given simply by <[<£j][<£j]><A • 
For the marginal distribution we have: 

(1.8) p. = 2 p ij = 2 [(? j ]> ^ s = <[< ^i ] (2 ^ ] )>^ s = <[<^i 2 >^ s . 

j j ^ j ' 

and we see that the marginal distribution over the <f>^ is independent of 
the set \9.\ chosen in S 2 . This result has the consequence in the ordi- 
nary interpretation that the expected outcome of measurement in one sub- 
system of a composite system is not influenced by the choice of quantity 
to be measured in the other subsystem. This expectation is, in fact, the 
expectation for the case in which no measurement at all (identity operator) 
is performed in the other subsystem. Thus no measurement in S„ can 
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affect the expected outcome of a measurement in Sj, so long as the re- 
sult of any S 2 measurement remains unknown. The case is quite different, 
however, if this result is known, and we must turn to the conditional dis- 
tributions and expectations in such a case. 

We now introduce the concept of a relative state-function, which will 

play a central role in our interpretation of pure wave mechanics. Consider 

c 

a composite system S = Sj + S 2 in the state if/ . To every state tj of 
S 2 we associate a state of Sj , ft^y called the relative state in S 1 for 
■q in S 2 , through: 

(1.9) DEFINITION. ^ = N 2 (^if.^Wi > 

i 

where \(f>^\ is any complete orthonormal set in S 1 and N is a normali- 
zation constant. 2 

The first property of is its uniqueness, 3 i.e., its dependence 

upon the choice of the basis {^| is only apparent. To prove this, choose 

another basis (f k l, with <f> i = 2 b ik^k' Then 2 b ij b ik = S jk' and: 

k i 

2 ton.**)* = 2 (2 b ij W s ) (2 b ik^k) 

= 2 ( 2 b *j b ik) ^ * **> f k = 2 8 jk j * f k 

jk V i / jk 

= 2 ( ^ s >fk- 

k 

The second property of the relative state, which justifies its name, is 
0' 

that ^ r ^j correctly gives the conditional expectations of all operators in 
Sj, conditioned by the state 0j in S 2 . As before let A be an operator 
in S l with eigenstates <£• and eigenvalues Aj. Then: 

2 In case S. (<^>. 77, iffi) (f> . = (unnormalizable) then choose any function for the 
relative function. This ambiguity has no consequences of any importance to us. 
See in this connection the remarks on p. 40. 

3 Except if Sj(<£j 7], <ffi) 0j = 0. There is still, of course, no dependence upon 
the basis. 
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(1.10) <A>^ 1 =(^ 1 ,A^ 1 ) 

^ i im ' 

im 
i 

At this point the normalizer N can be conveniently evaluated by using 

(1.10) to compute: <I* >ifr ^ = N 2 ^ 1 = N 2 Pj = 1, so that 

i 

(1.11) N 2 = 1/Pj . 

Substitution of (1.11) in (1.10) yields: 

6 6 

(1.12) < A >^ rel = d/Pj) ]g AjPjj = ^ X.pJ = Exp UA] , 

i i 

and we see that the conditional expectations of operators are given by the 
relative states. (This includes, of course, the conditional distributions 
themselves, since they may be obtained as expectations of projection 
operators.) 

c 

An important representation of a composite system state </r , in terms 
of an orthonormal set \6 : \ in one subsystem S~ and the set of relative 
states W f gjr in §\ is: 

(1.13) ^ S = £ {^0.^)^0. =2 (£0^ V S)< *iVj 

ij j * i ' 

j J L i J 

S^-tff \ 6, , where 1/N 2 = P= = <I 1 t6»-]>^ S 
N- rel J j J J 

j J 
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Thus, for any orthonormal set in one subsystem, the state of the composite 
system is a single superposition of elements consisting of a state of the 
given set and its relative state in the other subsystem. (The relative 
states, however, are not necessarily orthogonal.) We notice further that a 
particular element, t// J . is quite independent of the choice of basis 

' 

10^1, k^j, for the orthogonal space of 0-y since ^ ^ depends only on 
0- and not on the other 0^ for k ^ j. We remark at this point that the 
ambiguity in the relative state which arises when 2^i e j'^^i = 

i 

(see p. 38) is unimportant for this representation, since although any 
0' 

state t/r J can be regarded as the relative state in this case, the term 
0- rel 

^rel^j occur i° (1-13) with coefficient zero. 

Now that we have found subsystem states which correctly give condi- 
tional expectations, we might inquire whether there exist subsystem states 
which give marginal expectations. The answer is, unfortunately, no. Let 
us compute the marginal expectation of A in using the representa- 
tion (1.13): 

(1.14) Exp [A] = <AI 2 >^ S = (J1 +\e y AI 2 |1 

= 2NlT k fe A ^l) S jk 
jk J K 

i j j 

Now suppose that there exists a state in Sj , i]/', which correctly gives 
the marginal expectation (1.14) for all operators A (i.e., such that 
Exp [A] = <A>tf/' for all A). One such operator is [tfr'], the projection 
on for which <[t//']>i//' = 1. But, from (1.14) we have that Exp[^'] = 
Pj<^f'>^Jj, which is <1 unless, for all j, P- = or ij/^ = if/', a 
j 

condition which is not generally true. Therefore there exists in general 
no state for which correctly gives the marginal expectations lor all 
operators in S. . 
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However, even though there is generally no single state describing 
marginal expectations, we see that there is always a mixture of states, 
namely the states weighted with Pj, which does yield the correct 

expectations. The distinction between a mixture, M, of states <£j, 
weighted by Pj, and a pure state i/f which is a superposition, i/r = 

aj 4>- x , is that there are no interference phenomena between the various 

states of a mixture. The expectation of an operator A for the mixture is 
Exp M [A] = Pj< A>0j = ^ Pj(0i> A< £i)> while the expectation for the 

i i 

pure state t/r is <A>i/r=^£ a^j.A ^ a-^fj = ^ 3*3^, Ae£j), 

which is nor the same as that of the mixture with weights Pj = a*^. due 
to the presence of the interference terms (<£j, A<£j) for j ^ i. 

It is convenient to represent such a mixture by a density matrix, 4 p. 
If the mixture consists of the states tf/- weighted by Pj, and if we are 
working in a basis consisting of the complete orthonormal set i<£j!, where 
(/> j = ^ a? <£j, then we define the elements of the density matrix for the 
i 

mixture to be: 

(1.15) P k£ = 2 p j a i* 4 (a l = ^i'^ ' 

j 

Then if A is any operator, with matrix representation Aj£ = (<£j, At^) 
in the chosen basis, its expectation for the mixture is: 

(1.16) Ex P M [A] = ^ Pj(^j. A^) = J P j fe a i* a i (< ^i' 

j j Li£ J 

= 2 ( 2 p j a \*i) ^ ' A - 2 'a A ^ 

i£ \ j / i,£ 

= Trace (p A) . 



Also called a statistical operator (von Neumann [17]). 
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Therefore any mixture is adequately represented by a density matrix. 5 
Note also that p^g = p^, so that p is Hermitian. 

1 9 

Let us now find the density matrices p and p for the subsystems 
Sj and S 2 of a system S = + S 2 in the state i/r . Furthermore, let 
us choose the orthonormal bases and \rj-\ in and S 2 respec- 

tively, and let A be an operator in Sj, B an operator in S 2 - Then: 

(1.17) Exp [A] = < AI 2 >^ S = ^ S )fj AI ^ (£ g r, m , ^ S )f £ 

V ij £m / 

= 2 tfi1j^ S >*tf£l»^ S >tfi. A ff)(lj.1m) 

ij&n 

= Trace (p 1 A) , 
where we have defined p 1 in the {f j} basis to be: 

(1.18) pj. = 2 Gil-y**)* <&VI-y<lP) ■ 



J 



In a similar fashion we find that p 2 is given, in the 1 17 j t basis, by: 

d- 19 ) Pmn=2^V^)*(^7 m ^ S ). 

i 

It can be easily shown that here again the dependence of p 1 upon the 
choice of basis \t)-\ in S 2 , and of p 2 upon is only apparent. 



A better, coordinate free representation of a mixture is in terms of the opera- 
tor which the density matrix represents. For a mixture of states ^ n (not neces- 
sarily orthogonal) with weights p n , the density operator is p = Sp n [^ n ], where 
[^ n ] stands for the projection operator on tfr . 
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In summary, we have seen in this section that a state of a composite 
system leads to joint distributions over subsystem quantities which are 
generally not independent. Conditional distributions and expectations for 
subsystems are obtained from relative states, and subsystem marginal 
distributions and expectations are given by density matrices. 

There does not, in general, exist anything like a single state for one 
subsystem of a composite system. That is, subsystems do nor possess 
states independent of the states of the remainder of the system, so that 
the subsystem states are generally correlated. One can arbitrarily choose 
a state for one subsystem, and be led to the relative state for the other 
subsystem. Thus we are faced with a fundamental relativity of states, 
which is implied by the formalism of composite systems. It is meaning- 
less to ask the absolute state of a subsystem — one can only ask the 
state relative to a given state of the remainder of the system. 

§2. Information and correlation in quantum mechanics 

We wish to be able to discuss information and correlation for Hermi- 
tian operators A,B,..., with respect to a state function iff. These 
quantities are to be computed, through the formulas of the preceding 
chapter, from the square amplitudes of the coefficients of the expansion 
of iff in terms of the eigenstates of the operators. 

We have already seen (p. 34) that a state iff and an orthonormal basis 
{<£j} leads to a square amplitude distribution of iff over the set l^i : 

(2.D Pi = u v n 2 = <[^1>^ . 

so that we can define the information of the basis {^j! for the state iff, 
Ij0 .}bff), to be simply the information of this distribution relative to the 

uniform measure: 

(2.2) Ij^jty) = 2 P i ln P i = S 1(< ^i' ^ )|2 ln l(< ^' n * • 

i i 



44 



HUGH EVERETT, III 



We define the information of an operator A, for the state ^s, \^}lt), 
to be the information in the square amplitude distribution over its eigen- 
values, i.e., the information of the probability distribution over the results 
of a determination of A which is prescribed in the probabilistic interpre- 
tation. For a non-degenerate operator A this distribution is the same as 
the distribution (2.1) over the eigenstates. But because the information 
is dependent only on the distribution, and not on numerical values, the 
information of the distribution over eigenvalues of A is precisely the 
information of the eigenbasis of A,i^j!. Therefore: 

(2.3) I A 0A) = = <\$-)>*l> In <[<f>,\>ifr (A non-degenerate) . 

i 

We see that for fixed if/, the information of all non-degenerate operators 
having the same set of eigenstates is the same. 

In the case of degenerate operators it will be convenient to take, as 
the definition of information, the information of the square amplitude dis- 
tribution over the eigenvalues relative to the information measure which 
consists of the multiplicity of the eigenvalues, rather than the uniform 
measure. This definition preserves the choice of uniform measure over 
the eigenstates, in distinction to the eigenvalues. If <f>^ (j f rom 1 to m i) 
are a complete orthonormal set of eigenstates for A', with distinct eigen- 
values Aj (degenerate with respect to j ), then the multiplicity of the i tn 
eigenvalue is mj and the information 1^, (^) is defined to be: 

2 <[< £ij ]> ^ 

(2.4) I A ,«A)= 2(S <[ ^ij 1> ^) ln J 5q • 

The usefulness of this definition lies in the fact that any operator A" 
which distinguishes further between any of the degenerate states of A' 
leads to a refinement of the relative density, in the sense of Theorem 4, 
and consequently has equal or greater information. A non-degenerate 
operator thus represents the maximal refinement and possesses maximal 
information. 
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It is convenient to introduce a new notation for the projection opera- 
tors which are relevant for a specified operator. As before let A have 
eigenfunctions and distinct eigenvalues Xj. Then define the projec- 
tions Aj, the projections on the eigenspaces of different eigenvalues of 
A, to be: m . 

(2.5) A; = £ • 

j = l 

To each such projection there is associated a number mj, the multiplicity 
of the degeneracy, which is the dimension of the i 1 * 1 eigenspace. In this 
notation the distribution over the eigenvalues of A for the state i/r, Pj, 
becomes simply: 

(2.6) Pj = P(Aj) = <Aj>^r , 
and the information, given by (2.4), becomes: 

<A i >tJ, 



(2.7) I A = 2 <A i > ' A 



In 



Similarly, for a pair of operators, A in Sj and B in S 2 , for the 

c 

composite system S = S 1 +S 2 with state *{r , the ;'ofnr distribution ove 
eigenvalues is: 

(2.8) Pjj = P^./Xj) = <A i B j >^ S , 
and the marginal distributions are: 

(2.9) Pj = £ = <A i ^ B j) > ^ S = < A i l2 >^ • 

p j = S p ij = <(S A i) B i > ^ s =< i1b j>^ s • 

The joint information, I^g, * s Ei ven by-' 

P.. <A.Bj>^ S 

(2-10) I AB = 2 P U ln *tk= 2 < A i B j>^ In mL ' 

ij 1 ij 
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where mj and nj are the multiplicities of the eigenvalues Aj and fiy 
The marginal information quantities are given by: 



(2.11) I A - g< Ai I 2 >^ S ln ' y , 

i 

. <I 1 B->^ S 

lB = S< I B j^ ln n, ' 

j 

and finally the correlation, {A,B}^r is given by: 

(2.12) {A,B}^ S = V P In j^j- = V <hp,>4? In "'^ " § ' 

f 1 P i P j y 1 <A i I>^ S <IB.>^ S 



<A i B j >^ S 



where we note that the expression does not involve the multiplicities, as 
do the information expressions, a circumstance which simply reflects the 
independence of correlation on any information measure. These expres- 
sions of course generalize trivially to distributions over more than two 
variables (composite systems of more than two subsystems). 

In addition to the correlation of pairs of subsystem operators, given 
by (2.12), there always exists a unique quantity {Sj,S 2 !, the canonical 
correlation, which has some special properties and may be regarded as 
the fundamental correlation between the two subsystems S 1 and S 2 of 
the composite system S. As we remarked earlier a density matrix is 
Hermitian, so that there is a representation in which it is diagonal. 6 In 



The density matrix of a subsystem always has a pure discrete spectrum, if 
the composite system is in a state. To see this we note that the choice of any 
orthonormal basis in S 2 leads to a discrete (i.e., denumerable) set of relative 
states in S^. The density matrix in Sj then represents this discrete mixture, 

lit i, weighted by P.. This means that the expectation of the identity, Exp[l] = 
Of) 

2.P.(^ J'., 1^ J.) = S.P. = 1 = Trace (pi) = Trace (p). Therefore p has a finite 
J i rel rel y J 

trace and is a completely continuous operator, having necessarily a pure discrete 
spectrum. (See von Neumann [17], p. 89, footnote 115.) 
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particular, for the decomposition of S (with state ft ) into S x and S 2 , 

Si So 

we can choose a representation in which both p and p are diagonal. 

(This choice is always possible because p 1 is independent of the basis 

in S 2 and vice-versa.) Such a representation will be called a canonical 

representation. This means that it is always possible to represent the 
S 

state ft by a single superposition: 

(2.13) ft S = ^ • 

i 

where both the and the {77^ constitute orthonormal sets of states 

for Sj and S 2 respectively. 

To construct such a representation choose the basis ln ; j for S 9 so 
S 

that p is diagonal: 

(2.14) p\f = Vij . 

and let the f j be the relative states in S 1 for the r/j in S 2 : 

(2.15) £ = Nj ^(ft^ft^ft) (any basis {<£.}) . 

j 



Then, according to (1.13), ft is represented in the form (2.13) where the 
{r/j} are orthonormal by choice, and the jj are normal since they are 
relative states. We therefore need only show that the states jfjl are 
orthogonal: 

(2.16) (fyffc) = ( Nj ]|> £)Jj ,^ £ , N k 2 (<£ m >7 U ^ S )<£ m ) 



m 



* 



Em 

= NfN k ^(<PiVyft S )\<Pl%,ft S ) 
I 

- N J N kP« = N * N k X k S kj = ' j^k, 
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S 2 

since we supposed p to be diagonal in this representation. We have 

therefore constructed a canonical representation (2.13). 

The density matrix p 1 is also automatically diagonal, by the choice 

S 2 

of representation consisting of the basis in S 2 which makes p diago- 
nal and the corresponding relative states in Sj. Since are ortho- 
normal we have: 

(2. 17) P Sl = 2 £ i v k , <A S )* (S j v k , -A s ) = 

k 

2 'k' 2 a m f m ^m) (tj Ik' ^ a lh Vij 
= ^ a m a £ 5 im S km S jE S k£ = 2 a i a j S ki S kj 

= a r a i s ij = p i s ij • 

where P= = a?a- is the marginal distribution over the \£A. Similar com- 

111 1 

putation shows that the elements of p are the same : 

(2.18) pll = 4* k 8 kt = P k S k£ . 

Thus in the canonical representation both density matrices are diagonal 
and have the same elements, P^, which give the marginal square ampli- 
tude distribution over both of the sets {fji and {77^ forming the basis 
of the representation. 

Now, any pair of operators, A in Sj and B in S 2 , which have as 
non-degenerate eigenfunctions the sets if jj and j^j} (i.e., operators 
which define the canonical representation), are "perfectly" correlated in 
the sense that there is a one-one correspondence between their eigen- 
values. The joint square amplitude distribution for eigenvalues Aj of A 
and /ij of B is: 

(2.19) P(Xj and Mj ) = P(f. and Vj ) = Py = afa-Sy = P-fiy . 
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Therefore, the correlation between these operators, {A,B|^ is: 

P(A.&«.) PjSj. 

(2.20) |A,B|^ S = 2 p (^i and **j> 1" p (A .)p(^.) = 2 P i S U ln TIT 

ij 1 J ij 1 J 

= -2 p i lnP i • 

i 

We shall denote this quantity by {Sj, S 2 1^ S and call it the canonical 
correlation of the subsystems Sj and S 2 for the system state It 
is the correlation between any pair of non-degenerate subsystem operators 
which define the canonical representation. 

In the canonical representation, where the density matrices are diago- 
nal ((2.17) and (2.18)), the canonical correlation is given by: 

S S 

(2.21) iS 1 ,S 2 l^ S = -^P i ln Pi = - Trace (p 1 lnp *) 

i 

s s 

= -Trace(p 2 lnp 2 ) . 

But the trace is invariant for unitary transformations, so that (2.21) holds 
independently of the representation, and we have therefore established 
the uniqueness of {SpSji^ . 

It is also interesting to note that the quantity - Trace (p ln p) is 
(apart from a factor of Boltzman's constant) just the entropy of a mixture 
of states characterized by the density matrix p. Therefore the entropy 
of the mixture characteristic of a subsystem Sj for the state </f = 
^Si+Sj . s exact j v ma t c hed by a correlation information iS 1 ,S 2 !, which 
represents the correlation between any pair of operators A, B, which 
define the canonical representation. The situation is thus quite similar 

o 

to that of classical mechanics. 



See von Neumann [17], p. 296. 
Cf. Chapter II, §7. 
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Another special property of the canonical representation is that any 
operators A, B defining a canonical representation have maximum margi- 
nal information, in the sense that for any other discrete spectrum opera- 
tors, A on Sj, B on S 2 , 1^ = *A *B = *B' If the canonical repre- 
sentation is (2.13), with ifji, {r/jl non-degenerate eigenfunctions of A, 
B, respectively, and A, B any pair of non-degenerate operators with 
eigenfunctions \<f>^\ and Ifyl, where = 2 c ik^k' = 2 
then xjfi in <j),d representation is: k ^ 

(2.22) -A S = 2 a i c ik d ie«M£ = 2(2 a i c ik d i£Vk^£ • 

ik£ k£ V i ' 

and the joint square amplitude distribution for 0^, 0£ is: 

2 

(2.23) P ke = |^ a i c ik d ie)| = 2 a ^* c mk4<W • 
while the marginals are: 

(2.24) P k = 2 P k£ = 2 < a m c fk c mk 2 ^ 

I im £ 

a i a m c ik c mk S im = Z a i a i c ik c ik • 
im i 

and similarly 

(2.25) P£ = 2 P k£ = 2 a * a i d iH d i£- 

k i 
Then the marginal information 1^ is: 

(2.26) I A = 2 P k ln P k = 2 (2 ^Vfcik) In (2 a f a i c rk c ik) 

= 2(2 a i a i T ik) ln (2 a r a i T ik) . 

where = c^Cj^ is doubly-stochastic ( 2^ik = 2 ^ik = * follows 

i k 

from unitary nature of the c^). Therefore (by Corollary 2, §4, Appendix I): 
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(2.27) lA = S (S a r a i T ik) ln (2 a * a i T ik) 

^a^lna^I^, 

i 

and we have proved that A has maximal marginal information among the 
discrete spectrum operators. Identical proof holds for B. 

While this result was proved only for non-degenerate operators, it is 
immediately extended to the degenerate case, since as a consequence of 
our definition of information for a degenerate operator, (2.4), its informa- 
tion is still less than that of an operator which removes the degeneracy. 
We have thus proved: 

THEOREM. 1^ S I£» w ^ ere A is any non-degenerate operator defining 
the canonical representation, and A is any operator with discrete spec- 
trum. 

We conclude the discussion of the canonical representation by conjec- 
turing that in addition to the maximum marginal information properties of 
A, B, which define the representation, they are also maximally correlated, 
by which we mean that for any pair of operators C in Sj, D in S 2 , 
iC,D|<{A,Bl, i.e.,: 

(2.28) CONJECTURE. 9 {C,D}^ S <{A,B}^ S = iS 1 ,S 2 i^ S 

for all C on S lf D on S 2 - 

As a final topic for this section we point out that the uncertainty 
principle can probably be phrased in a stronger form in terms of informa- 
tion. The usual form of this principle is stated in terms of variances, 
namely: 

9 The relations ic.Bj ^ = {Sj.S^ and |a,d| ^ {Sj.Sj} for all C on Sj, 

D on Sj, can be proved easily in a manner analogous to (2.27). These do not, 
however, necessarily imply the general relation (2.28). 
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(2.29) a\o\ > i for all ^(x) , 
where ct x = <y?>\fi — [<x>^] 2 and 

°i-<hiy>*-b->k*] 2 -<{lj>*-[<H ■ 

The conjectured information form of this principle is: 

(2.30) I x + I k < In (I/77 e) for all ^(x). 

Although this inequality has not yet been proved with complete rigor, it 
is made highly probable by the circumstance that equality holds for ^(x) 

of the form </f( x ) = (l/2rr) 4 exponent I I the so called "minimum un- 



certainty packets" which give normal distributions for both position and 
momentum, and that furthermore the first variation of (I x + I k ) vanishes 
for such ^(x). (See Appendix I, §6.) Thus, although ln(l/77e) has not 
been proved an absolute maximum of I x + I k , it is at least a stationary 
value. 

The principle (2.30) is stronger than (2.29), since it implies (2.29) 
but is not implied by it. To see that it implies (2.29) we use the well 
known fact (easily established by a variation calculation: that, for fixed 
variance a , the distribution of minimum information is a normal distribu- 



tion, which has information I = ln(l/av277e). This gives us the general 
inequality involving information and variance: 



(2.31) I Z lnU/ffy^e) (for all distributions) . 
Substitution of (2.31) into (2.30) then yields: 

(2.32) In (1/<t x V21tT) + In (l/^v^Fe" ) < I x + I k < In (1/rre ) 

(l/<7 x <7 k 277e) ^ (1/ire) o\ol > J- , 
so that our principle implies the standard principle (2.29). 
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To show that (2.29) does not imply (2.30) it suffices to give a counter- 
example. The distributions P(x) = |S(x) + i-S(x-lO) and P(k) = |S(k) + 
i-S(k-lO), which consist simply of spikes at and 10, clearly satisfy 

(2.29) , while they both have infinite information and thus do nor satisfy 

(2.30) . Therefore it is possible to have arbitrarily high information about 
both x and k (or p) and still satisfy (2.13). We have, then, another 
illustration that information concepts are more powerful and more natural 
than the older measures based upon variance. 

§3. Measurement 

We now consider the question of measurement in quantum mechanics, 
which we desire to treat as a natural process within the theory of pure 
wave mechanics. From our point of view there is no fundamental distinc- 
tion between "measuring apparata" and other physical systems. For us, 
therefore, a measurement is simply a special case of interaction between 
physical systems — an interaction which has the property of correlating a 
quantity in one subsystem with a quantity in another. 

Nearly every interaction between systems produces some correlation 
however. Suppose that at some instant a pair of systems are independent, 
so that the composite system state function is a product of subsystem 
states (</r S = ^f Sl <A S2 ). Then this condition obviously holds only instan- 
taneously if the systems are interacting 10 - the independence is immediate- 
ly destroyed and the systems become correlated. We could, then, take the 
position that the two interacting systems are continually "measuring" one 
another, if we wished. At each instant t we could put the composite 
system into canonical representation, and choose a pair of operators A(t) 



10 If U^ is the unitary operator generating the time dependence for the state 
function of the composite system S = Sj + S 2> so that ^ t = U { t// Q , then we 
shall say that Sj and S 2 have not interacted during the time interval [O.t] if 
and only if U^ is the direct product of two subsystem unitary operators, i.e., if 
Uj = Uj 1 ® Uj 2 . 
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in Sj and B(t) in S 2 which define this representation. We might then 
reasonably assert that the quantity A in is measured by B in S 2 
(or vice-versa), since there is a one-one correspondence between their 
values. 

Such a viewpoint, however, does not correspond closely with our in- 
tuitive idea of what constitutes "measurement," since the quantities A 
and B which turn out to be measured depend not only on the time, but 
also upon the initial state of the composite system. A more reasonable 
position is to associate the term "measurement" with a fixed interaction 
H between systems, 11 and to define the "measured quantities" not as 
those quantities A(t), B(t) which are instantaneously canonically corre- 
lated, but as the limit of the instantaneous canonical operators as the time 
goes to infinity, A M , B M — provided that this limit exists and is inde- 
pendent of the initial state. In such a case we are able to associate the 
"measured quantities, " A^, B^, with the interaction H independently 
of the actual system states and the time. We can therefore say that H is 
an interaction which causes the quantity A^ in Sj to be measured by 
B M in S 2> For finite times of interaction the measurement is only ap- 
proximate, approaching exactness as the time of interaction increases in- 
definitely. 

There is still one more requirement that we must impose on an inter- 
action before we shall call it a measurement. If H is to produce a 
measurement of A in S, by B in S,, then we require that H shall 



Here H means the total Hamiltonian of S, not just an interaction part. 

Actually, rather than referring to canonical operators A, B, which are not 
unique, we should refer to the bases of the canonical representation, in Sj 

and \r].\ in Sj, since any operators A* = .], B = 2j f^b^], with the com- 

pletely arbitrary eigenvalues fly are canonical. The limit then refers to the 
limit of the canonical bases, if it exists in some appropriate sense. However, we 
shall, for convenience, continue to represent the canonical bases by operators. 
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never decrease the information in the marginal distribution of A. If H 
is to produce a measurement of A by correlating it with B, we expect 
that a knowledge of B shall give us more information about A than we 
had before the measurement took place, since otherwise the measurement 
would be useless. Now, H might produce a correlation between A and 
B by simply destroying the marginal information of A, without improving 
the expected conditional information of A given B, so that a knowledge 
of B would give us no more information about A than we possessed 
originally. Therefore in order to be sure that we will gain information 
about A by knowing B, when B has become correlated with A, it is 
necessary that the marginal information about A has not decreased. The 
expected information gain in this case is assured to be not less than the 
correlation {A,B|. 

The restriction that H shall not decrease the marginal information 
of A has the interesting consequence that the eigenstates of A will not 

c 

be distrubed, i.e., initial states of the form iffQ = <f>rj Q , where is an 

eigenfunction of A, must be transformed after any time interval into 
o 

states of the form ifr^ = (f> r/ t , since otherwise the marginal information of 
A, which was initially perfect, would be decreased. This condition, in 
turn, is connected with the repeatability of measurements, as we shall 
subsequently see, and could alternately have been chosen as the condition 
for measurement. 

We shall therefore accept the following definition. An interaction H 
is a measurement of A in S x by B in S 2 if H does not destroy the 
marginal information of A (equivalently: if H does not disturb the 
eigenstates of A in the above sense) and if furthermore the correlation 
|A,B| increases toward its maximum 13 with time. 



The maximum of {A,Bi is — 1^ if A has only a discrete spectrum, and oo 
if it has a continuous spectrum. 



56 



HUGH EVERETT, III 



We now illustrate the production of correlation with an example of a 
simplified measurement due to von Neumann. 14 Suppose that we have a 
system of only one coordinate, q, (such as position of a particle), and 
an apparatus of one coordinate r (for example the position of a meter 
needle). Further suppose that they are initially independent, so that the 
combined wave function is ^^"^ = 4>(q) f?(r), where <£(q) is the initial 
system wave function, and 7/(r) is the initial apparatus function. Finally 
suppose that the masses are sufficiently large or the time of interaction 
sufficiently small that the kinetic portion of the energy may be neglected, 
so that during the time of measurement the Hamiltonian shall consist only 
of an interaction, which we shall take to be: 



(3.1) 



Hl --ttq * 



(3.2) 



Then it is easily verified that the state t/r^ +A (q,r): 
I) ^ +A (q,r) = 0(q)7,(r-qt). 



is a solution of the Schrodinger equation 



(3.3) 




S+A 



for the specified initial conditions at time t = 0. 




14 



von Neumann [l7], p. 442. 



THEORY OF THE UNIVERSAL WAVE FUNCTION 



57 



and we note that for a fixed time, t, the conditional square amplitude 
distribution for r has been translated by an amount depending upon the 
value of q, while the marginal distribution for q has been unaltered. 
We see thus that a correlation has been introduced between q and r by 
this interaction, which allows us to interpret it as a measurement. It is 
instructive to see quantitatively how fast this correlation takes place. We 
note that: - 

(3.5) I QR (t) = f f P t (q,r) In P t (q,r) dqdr 



II 
II 



P 1 (q)P 2 (r-qt) In P 1 (q)P 2 (r-qt) dqdr 
P x (q)P 2 (w) In P 1 (q)P 2 (<y) dqdw 



= IqrO) . 

so that the information of the joint distribution does not change. Further- 
more, since the marginal distribution for q is unchanged: 

(3.6) I Q (t) = I Q (0) , 

and the only quantity which can change is the marginal information, I R , 
of r, whose distribution is: 

(3.7) P t (r) = J P t (r,q)dq = J P^P^r-qOdq . 

Application of a special inequality (proved in §5, Appendix I) to (3.7) 
yields the relation: 

(3.8) I R (t) < I Q (0) - in t , 

so that, except for the additive constant Iq(0), the marginal information 
I R tends to decrease at least as fast as In t with time during the inter- 
action. This implies the relation for the correlation: 
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(3.9) iQ,Ri t = I QR (t) - I Q (t) - I R (t) > I RQ (t) - I Q (t) - I Q (0) + In t . 



But at t = the distributions for R and Q were independent, so that 
I R q(0) = I R (0) + Iq(0). Substitution of this relation, (3.5), and (3.6) into 
(3.9) then yields the final result: 



Therefore the correlation is built up at least as fast as In t, except for 
an additive constant representing the difference of the information of the 
initial distributions P 2 (r) and P 1 (q). Since the correlation goes to in- 
finity with increasing time, and the marginal system distribution is not 
changed, the interaction (3. 1) satisfies our definition of a measurement of 
q by r. 

Even though the apparatus does not indicate any definite system value 
(since there are no independent system or apparatus states), one can 
nevertheless look upon the total wave function (3.2) as a superposition of 
pairs of subsystem states, each element of which has a definite q value 
and a correspondingly displaced apparatus state. 15 Thus we can write 



which is a superposition of states ty^ = S(q— q') jj(t— q't). Each of these 
elements, tfr^, of the superposition describes a state in which the sys- 
tem has the definite value q = q', and in which the apparatus has a state 
that is displaced from its original state by the amount q't. These ele- 



ments are then superposed with coefficients <£(q') to form the total 
state (3.11). 



(3.10) 



iQ,Ri t Z Ir(0) ~ Iq(0) + In t . 



(3.2) as: 
(3.11) 




15 



See discussion of relative states, p. 38. 
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Conversely, if we transform to the representation where the apparatus 
is definite, we write (3.2) as: 

(3.12) ^ A = J (l/N r ,)f r (q)S(r-r')dr' , 

where £ r '(q) = N f , <£(q) 7j(r -qt) 



and (1/IV) 2 = j (^(q^q^V-qO^r-qOdq . 

Then the f r (q) are the relative system state functions for the apparatus 
states S(r-r') of definite value r = r". 

We notice that these relative system states, f r (q), are nearly eigen- 
states for the values q = r'/t, if the degree of correlation between q and 
r is sufficiently high, i.e., if t is sufficiently large, or ?j(r) sufficiently 
sharp (near S(r) ) then f r (q) is nearly 5(q-r'/t). 

This property, that the relative system states become approximate 
eigenstates of the measurement, is in fact common to all measurements. 
If we adopt as a measure of the nearness of a state tfr to being an eigen- 
function of an operator A the information I^(^)f which is reasonable 
because I^(^) measures the sharpness of the distribution of A for ^, 
then it is a consequence of our definition of a measurement that the rela- 
tive system states tend to become eigenstates as the interaction proceeds. 
Since Exp[Ig] = Iq + {Q.Rl, and Iq remains constant while }Q,R} 
tends toward its maximum (or infinity) during the interaction, we have that 
Exp [Iq] tends to a maximum (or infinity). But Iq is just the information 
in the relative system states, which we have adopted as a measure of the 
nearness to an eigenstate. Therefore, at least in expectation, the relative 
system states approach eigenstates. 

We have seen that (3.12) is a superposition of states \iy, /or each 
of which the apparatus has recorded a definite value r', and the system 
is left in approximately the eigenstate of the measurement corresponding 
to q = r'/t. The discontinuous "jump" into an eigenstate is thus only a 



*, 
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relative proposition, dependent upon our decomposition of the total wave 
function into the superposition, and relative to a particularly chosen appa- 
ratus value. So far as the complete theory is concerned all elements of 
the superposition exist simultaneously, and the entire process is quite 
continuous. 

We have here only a special case of the following general principle 
which will hold for any situation which is treated entirely wave mechani- 
cally: 

PRINCIPLE. For any situation in which the existence of a property Rj 
for a subsystem of a composite system S will imply the later property 
Q- for S, then it is also true that an initial state for S. of the form 
ifr 1 = ^ a i^[R.] which is a superposition of states with the properties 

Rj, will result in a later state for S of the form iff = ^ a i^tQ ] ' 

i 1 

which is also a superposition, of states with the property Qj. That is, 

for any arrangement of an interaction between two systems S« and S 2 , 

S S 

which has the property that each initial state <f>\ ft result in a 

c , c 

final situation with total state iff- , an initial state of S, of the 
form ^ a i^i lead, after interaction, to the superposition 

a i^f 1+ ^ 2 f° r *h e whole system. 

i 

This follows immediately from the superposition principle for solutions 
of a linear wave equation. It therefore holds for any system of quantum 
mechanics for which the superposition principle holds, both particle and 
field theories, relativistic or not, and is applicable to all physical sys- 
tems, regardless of size. 

This principle has the far reaching implication that for any possible 
measurement, for which the initial system state is not an eigenstate, the 
resulting state of the composite system leads to no definite system state 
nor any definite apparatus state. The system will not be put into one or 
another of its eigenstates with the apparatus indicating the corresponding 
value, and nothing resembling Process 1 can take place. 
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To see that this is indeed the case, suppose that we have a measur- 
ing arrangement with the following properties. The initial apparatus state 

is tpQ. If the system is initially in an eigenstate of the measurement, 

c S A 

, then after a specified time of interaction the total state 0j \jj Q will 

be transformed into a state <f>fifr^, i.e., the system eigenstate shall not 
be disturbed, and the apparatus state is changed to ifr^, which is differ- 
ent for each . (ifr^ may for example be a state describing the appara- 

c 

tus as indicating, by the position of a meter needle, the eigenvalue of <£j .) 
However, if the initial system state is not an eigenstate but a superposi- 
tion ^ aj <f>f, then the final composite system state is also a superposi- 
i 

tion, ^ a j0?^. This follows from the superposition principle since 

i S A 

all we need do is superpose our solutions for the eigenstates, <f>^ iff Q -> 

<f>fi/' A , to arrive at the solution, ^a-^f - ^ a-<£? </rf, for the 

i i 

general case. Thus in general after a measurement has been performed 
there will be no definite system state nor any definite apparatus state, 
even though there is a correlation. It seems as though nothing can ever 
be settled by such a measurement. Furthermore this result is independent 
of the size of the apparatus, and remains true for apparatus of quite mac- 
roscopic dimensions. 

Suppose, for example, that we coupled a spin measuring device to a 
cannonball, so that if the spin is up the cannonball will be shifted one 
foot to the left, while if the spin is down it will be shifted an equal dis- 
tance to the right. If we now perform a measurement with this arrangement 
upon a particle whose spin is a superposition of up and down, then the 
resulting total state will also be a superposition of two states, one in 
which the cannonball is to the left, and one in which it is to the right. 
There is no definite position for our macroscopic cannonball! 

This behavior seems to be quite at variance with our observations, 
since macroscopic objects always appear to us to have definite positions. 
Can we reconcile this prediction of the purely wave mechanical theory 
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with experience, or must we abandon it as untenable? In order to answer 
this question we must consider the problem of observation itself within 
the framework of the theory. 



IV. OBSERVATION 



We shall now give an abstract treatment of the problem of observation. 
In keeping with the spirit of our investigation of the consequences of pure 
wave mechanics we have no alternative but to introduce observers, con- 
sidered as purely physical systems, into the theory. 

We saw in the last chapter that in general a measurement (coupling of 
system and apparatus) had the outcome that neither the system nor the 
apparatus had any definite state after the interaction — a result seemingly 
at variance with our experience. However, we do not do justice to the 
theory of pure wave mechanics until we have investigated what the theory 
itself says about the appearance of phenomena to observers, rather than 
hastily concluding that the theory must be incorrect because the actual 
states of systems as given by the theory seem to contradict our observa- 
tions. 

We shall see that the introduction of observers can be accomplished 
in a reasonable manner, and that the theory then predicts that the appear- 
ance of phenomena, as the subjective experience of these observers, is 
precisely in accordance with the predictions of the usual probabilistic 
interpretation of quantum mechanics. 

§1. Formulation of the problem 

We are faced with the task of making deductions about the appearance 
of phenomena on a subjective level, to observers which are considered as 
purely physical systems and are treated within the theory. In order to 
accomplish this it is necessary to identify some objective properties of 
such an observer (states) with subjective knowledge (i.e., perceptions). 
Thus, in order to say that an observer O has observed the event o, it 
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is necessary that the state of has become changed from its former 
state to a new state which is dependent upon a. 

It will suffice for our purposes to consider our observers to possess 
memories (i.e., parts of a relatively permanent nature whose states are in 
correspondence with the past experience of the observer). In order to 
make deductions about the subjective experience of an observer it is suf- 
ficient to examine the contents of the memory. 

As models for observers we can, if we wish, consider automatically 
functioning machines, possessing sensory apparata and coupled to re- 
cording devices capable of registering past sensory data and machine 
configurations. We can further suppose that the machine is so constructed 
that its present actions shall be determined not only by its present sen- 
sory data, but by the contents of its memory as well. Such a machine will 
then be capable of performing a sequence of observations (measurements), 
and furthermore of deciding upon its future experiments on the basis of 
past results. We note that if we consider that current sensory data, as 
well as machine configuration, is immediately recorded in the memory, 
then the actions of the machine at a given instant can be regarded as a 
function of the memory contents only, and all relevant experience of the 
machine is contained in the memory. 

For such machines we are justified in using such phrases as "the 
machine has perceived A" or "the machine is aware of A" if the occur- 
rence of A is represented in the memory, since the future behavior of 
the machine will be based upon the occurrence of A. In fact, all of the 
customary language of subjective experience is quite applicable to such 
machines, and forms the most natural and useful mode of expression when 
dealing with their behavior, as is well known to individuals who work 
with complex automata. 

When dealing quantum mechanically with a system representing an ob- 
server we shall ascribe a state function, if/ , to it. When the State ^ 
describes an observer whose memory contains representations of the 
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events A,B,...,C we shall denote this fact by appending the memory se- 
quence in brackets as a subscript, writing: 

,0 

[A,B C] • 

The symbols A,B,...,C, which we shall assume to be ordered time wise, 
shall therefore stand for memory configurations which are in correspond- 
ence with the past experience of the observer. These configurations can 
be thought of as punches in a paper tape, impressions on a magnetic reel, 
configurations of a relay switching circuit, or even configurations of brain 
cells. We only require that they be capable of the interpretation "The 
observer has experienced the succession of events A,B,...,C." (We shall 
sometimes write dots in a memory sequence, [...A,B,...,C], to indicate 
the possible presence of previous memories which are irrelevant to the 
case being considered.) 

Our problem is, then, to treat the interaction of such observer-systems 
with other physical systems (observations), within the framework of wave 
mechanics, and to deduce the resulting memory configurations, which we 
can then interpret as the subjective experiences of the observers. 

We begin by defining what shall constitute a "good" observation. A 
good observation of a quantity A, with eigenfunctions j^} for a system 
S, by an observer whose initial state is 4^ y shall consist of an inter- 
action which, in a specified period of time, transforms each (total) state 

= 

into a new state 

where aj characterizes the state <f>^. (It might stand for a recording of 
the eigenvalue, for example.) That is, our requirement is that the system 
state, if it is an eigenstate, shall be unchanged, and that the observer 
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state shall change so as to describe an observer that is "aware" of which 
eigenfunction it is, i.e., some property is recorded in the memory of the 
observer which characterizes <f> lt such as the eigenvalue. The require- 
ment that the eigenstates for the system be unchanged is necessary if the 
observation is to be significant (repeatable), and the requirement that the 
observer state change in a manner which is different for each eigenfunc- 
tion is necessary if we are to be able to call the interaction an observa- 
tion at all. 

§2. Deductions 

From these requirements we shall first deduce the result of an obser- 
vation upon a system which is not in an eigenstate of the observation. We 
know, by our previous remark upon what constitutes a good observation 
that the interaction transforms states d>- if/9 , into states <£• ^9 n . 

Consequently we can simply superpose these solutions of the wave equa- 
tion to arrive at the final state for the case of an arbitrary initial system 
state. Thus if the initial system state is not an eigenstate, but a general 
state ^jT aj0j, we get for the final total state: 

(2.D 1 ^S^L^r 

i 1 

This remains true also in the presence of further systems which do 
not interact for the time of measurement. Thus, if systems S. ,S,,...,S„ 

s s s 

are present as well as O, with original states i(/ ,if/ 2 ,...,tfr n , and 
the only interaction during the time of measurement is between S x and 
0, the result of the measurement will be the transformation of the initial 
total state: 

s 1+ s 9 +...+s_+o S. S, S_ , 

^ 1 2 n = ^ V 2 ...xJ, n tf/V ^ 
into the final state: 



S..+S,,+...+S_+0 __. o, o , o o 

(2.2) r 1 2 n = 2 a i^i V j 
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/ S. SA S. 
where = l^j ) afl d <f>^ are eigenfunctions of the observation. 

Thus we arrive at the general rule for the transformation of total state 

functions which describe systems within which observation processes 



occur: 



Rule 1. The observation of a quantity A, with eigenfunctions $j , in 



a system S 1 by the observer O, transforms the total state according to: 



where aj 

If we next consider a second observation to be made, where our total 
state is now a superposition, we can apply Rule 1 separately to each ele- 
ment of the superposition, since each element separately obeys the wave 
equation and behaves independently of the remaining elements, and then 
superpose the results to obtain the final solution. We formulate this as: 

Rule 2. Rule 1 may be applied separately to each element of a superposi- 
tion of total system states, the results being superposed to obtain the 
final total state. Thus, a determination of B, with eigenfunctions 7jj , 
on S 2 by the observer transforms the total state 

into the state 

i.j J 
where bj = (r^ 2 ,^^ which follows from the application of Rule 1 to 

each element <£ Sl i/^ 2 ...^ 5 "^ -1, and then superposing the results 
with the coefficients a ; . 
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These two rules, which follow directly from the superposition princi- 
ple, give us a convenient method for determining final total states for any 
number of observation processes in any combinations. We must now seek 
the interpretation of such final total states. 

Let us consider the simple case of a single observation of a quantity 

c 

A, with eigenfunctions <£j, in the system S with initial state iff , by 
an observer O whose initial state is ifr® y The final result is, as we 
have seen, the superposition: 

< 2 - 3 > ^° = 2 a ^L, ai ] • 

We note that there is no longer any independent system state or observer 
state, although the two have become correlated in a one-one manner. How- 
ever, in each element of the superposition (2.3), <£:</r9 -i, the object- 

system state is a particular eigenstate of the observer, and furthermore 
the observer-system state describes the observer as definitely perceiving 
that particular system state. 1 It is this correlation which allows one to 
maintain the interpretation that a measurement has been performed. 

We now carry the discussion a step further and allow the observer- 
system to repeat the observation. Then according to Rule 2 we arrive at 
the total state after the second observation: 



At this point we encounter a language difficulty. Whereas before the observa- 
tion we had a single observer state afterwards there were a number of different 
states for the observer, all occurring in a superposition. Each of these separate 
states is a state for an observer, so that we can speak of the different observers 
described by the different states. On the other hand, the same physical system 
is involved, and from this viewpoint it is the same observer, which is in different 
states for different elements of the superposition (i.e., has had different experi- 
ences in the separate elements of the superposition). In this situation we shall 
use the singular when we wish to emphasize that a single physical system is in- 
volved, and the plural when we wish to emphasize the different experiences for 
the separate elements of the superposition, (e.g., "The observer performs an ob- 
servation of the quantity A, after which each of the observers of the resulting 
superposition has perceived an eigenvalue.") 
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Again, we see that each element of (2.4), <f>^?.r y describes a sys^ 

tem eigenstate, but this time also describes the observer as having ob- 

obtained the same result for each of the two observations. Thus for every 

separate state of the observer in the final superposition, the result of the 

observation was repeatable, even though different for different states. 

This repeatability is, of course, a consequence of the fact that after an 

observation the relative system state for a particular observer state is 

the corresponding eigenstate. 

Let us suppose now that an observer-system O, with initial state 

dr? n , measures the same quantity A in a number of separate identical 

l*''! Si S2 S 

systems which are initially in the same state, iff =^1 =...=\fr 

a j<£j (where the <£j are, as usual, eigenfunctions of A). The initial 
i 

total state function is then 

S.+S^+.-.+S.+O S, S, S 

(2.3) dr Q l 2 n =fV "^j • 

We shall assume that the measurements are performed on the systems in 
the order Sj,S 2 ,...,S n . Then the total state after the first measurement 
will be, by Rule 1, 

S 1+ S 2 +...+S+0 ^ S. S, S o 

(2.4) 2 n =2 a i^i + - + *il..,a\] 

i 

(where aj refers to the first system, Sj) . 
After the second measurement it will be, by Rule 2, 

S.+S-+...S.+0 S. S, S, S o 

(2.5) ^ 2 » -2-i a J*i *J ^ V SU/I^?] 
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and in general, after r measurements have taken place (r ^ n) Rule 2 
gives the result: 

(2.6) ^ X Vj"'^^^-^^.../"^^... , a |, a 3 • 
i,J,...,k 1 J K 

We can give this state, ^ r , the following interpretation. It consists 
of a superposition of states: 

(2.7, ^ ij ...k-*fVi 2 -4v s '«.../va.. k [....i,|,...4] 

each of which describes the observer with a definite memory sequence 

[...,aKcr? ,...,an, and relative to whom the (observed system states are 

St s s 

the corresponding eigenf unctions <f>^ ,<f>^ ,...,<f>^, the remaining sys- 
tems, S r+1 , ...S n , being unaltered. 

In the language of subjective experience, the observer which is de- 
scribed by a typical element, ifr-j i_, of the superposition has perceived 

1J . . . K 

an apparently random sequence of definite results for the observations. It 
is furthermore true, since in each element the system has been left in an 
eigenstate of the measurement, that if at this stage a redetermination of 
an earlier system observation (Sg) takes place, every element of the re- 
sulting final superposition will describe the observer with a memory con- 
figuration of the form [...,a{,...,q|,...,aj c ,a|| in which the earlier memory 
coincides with the later — i.e., the memory states are correlated. It will 
thus appear to the observer which is described by a typical element of the 
superposition that each initial observation on a system caused the system 
to "jump" into an eigenstate in a random fashion and thereafter remain 
there for subsequent measurements on the same system. Therefore, quali- 
tatively, at least, the probabilistic assertions of Process 1 appear to be 
valid to the observer described by a typical element of the final super- 
position. 

In order to establish quantitative results, we must put some sort of 
measure (weighting) on the elements of a final superposition. This is 
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necessary to be able to make assertions which will hold for almost all of 
the observers described by elements of a superposition. In order to make 
quantitative statements about the relative frequencies of the different 
possible results of observation which are recorded in the memory of a 
typical observer we must have a method of selecting a typical observer. 

Let us therefore consider the search for a general scheme for assign- 
ing a measure to the elements of a superposition of orthogonal states 
^aj<^j. We require then a positive function 5H of the complex coeffi- 
cients of the elements of the superposition, so that 5H(aj) shall be the 
measure assigned to the element <£j. In order that this general scheme 
shall be unambiguous we must first require that the states themselves 
always be normalized, so that we can distinguish the coefficients from 
the states. However, we can still only determine the coefficients, in dis- 
tinction to the states, up to an arbitrary phase factor, and hence the func- 
tion % must be a function of the amplitudes of the coefficients alone, 
(i.e., 3R(aj) = 5R(\/a?aj) ), in order to avoid ambiguities. 

If we now impose the additivity requirement that if we regard a subset 
n 

of the superposition, say ^ a j0j» as a single element a<j>': 

i=l 

n 

(2.8) af=Ja^, 

i = l 

then the measure assigned to <f>' shall be the sum of the measures 
assigned to the 0j (i from 1 to n) : 

(2.9) 3K(a) = ^ Mtei) , 

i 

then we have already restricted the choice of )H to the square amplitude 
alone. (3H(a.) = afa^), apart from a multiplicative constant.) 

To see this we note that the normality of <j>' requires that \a\ = 



a i a i • ^rom our remarks upon the dependence of 5H upon the ampli- 
tude alone, we replace the aj by their amplitudes /ij = |aj|. 
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(2.9) then requires that 

(2. io) x<» = % (ys-r-i) = *(>/!>?) = 2 ^ - 2 ^> • 

Defining a new function g(x): 

(2.11) g (x) = 5H(v^) , 

we see that (2.10) requires that 

(2.12) e(2>i)=2>i>' 

so that g is restricted to be linear and necessarily has the form: 

(2.13) g(x) = cx (c constant) . 

Therefore g(x 2 ) = cx 2 = = JR(x) and we have deduced that )H is re- 

stricted to the form 

(2.14) jRCap = WOip = cm? = cafaj , 

and we have shown that the only choice of measure consistent with our 
additivity requirement is the square amplitude measure, apart from an arbi- 
trary multiplicative constant which may be fixed, if desired, by normaliza- 
tion requirements. (The requirement that the total measure be unity implies 
that this constant is 1.) 

The situation here is fully analogous to that of classical statistical 
mechanics, where one puts a measure on trajectories of systems in the 
phase space by placing a measure on the phase space itself, and then 
making assertions which hold for "almost all" trajectories (such as 
ergodicity, quasi-ergodicity, etc). 2 This notion of "almost all" depends 
here also upon the choice of measure, which is in this case taken to be 
Lebesgue measure on the phase space. One could, of course, contradict 



See Khinchin [16]. 
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the statements of classical statistical mechanics by choosing a measure 
for which only the exceptional trajectories had nonzero measure. Never- 
theless the choice of Lebesgue measure on the phase space can be justi- 
fied by the fact that it is the only choice for which the "conservation of 
probability" holds, (Liouville's theorem) and hence the only choice which 
makes possible any reasonable statistical deductions at all. 

In our case, we wish to make statements about "trajectories" of ob- 
servers. However, for us a trajectory is constantly branching (transform- 
ing from state to superposition) with each successive measurement. To 
have a requirement analogous to the "conservation of probability" in the 
classical case, we demand that the measure assigned to a trajectory at 
one time shall equal the sum of the measures of its separate branches at 
a later time. This is precisely the additivity requirement which we im- 
posed and which leads uniquely to the choice of square-amplitude measure. 
Our procedure is therefore quite as justified as that of classical statisti- 
cal mechanics. 

Having deduced that there is a unique measure which will satisfy our 
requirements, the square-amplitude measure, we continue our deduction. 
This measure then assigns to the i.j,...,^ element of the superposition 



(2.6), 
(2.15) 



i 1 j 2 ...<^ k V r+1 ... 



* Si ve...kt... 




,..,41 • 



the measure (weight) 



(2.16) 



M ij...k = < a i a j- a k> <«iaj...a k ) ■ 



so that the observer state with memory configuration [....aj.aj ,...,a k ] is 
assigned the measure a * a i a * a j--- a ^ a k = Mjj k . We see immediately that 
this is a product measure, namely 



(2.17) 



M. 



ij...k 




k • 



where 



M£ 
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so that the measure assigned to a particular memory sequence 

[...,a| ,a? t ...,a^] is simply the product of the measures for the individual 

components of the memory sequence. 

We notice now a direct correspondence of our measure structure to the 
probability theory of random sequences. Namely, if we were to regard the 
Mjj k as probabilities for the sequences [....aj ,a| ,...,a^], then the se- 
quences are equivalent to the random sequences which are generated by 
ascribing to each term the independent probabilities = a£ a£. Now the 
probability theory is equivalent to measure theory mathematically, so that 
we can make use of it, while keeping in mind that all results should be 
translated back to measure theoretic language. 

Thus, in particular, if we consider the sequences to become longer 
and longer (more and more observations performed) each memory sequence 
of the final superposition will satisfy any given criterion for a randomly 
generated sequence, generated by the independent probabilities ajfaj, ex- 
cept for a set of total measure which tends toward zero as the number of 
observations becomes unlimited. Hence all averages of functions over 
any memory sequence, including the special case of frequencies, can be 
computed from the probabilities a-fa^, except for a set of memory sequen- 
ces of measure zero. We have therefore shown that the statistical asser- 
tions of Process 1 will appear to be valid to almost all observers de- 
scribed by separate elements of the superposition (2.6), in the limit as 
the number of observations goes to infinity. 

While we have so far considered only sequences of observations of 
the same quantity upon identical systems, the result is equally true for 
arbitrary sequences of observations. For example, the sequence of obser- 
vations of the quantities A 1 , A 2 ,..., A n ,... with (generally different) 
eigenfunction sets [<f>\\, i<£ 2 j,..., {<£!!!,... applied successively to the 

s s s 

systems S 1( S 2 ,...,S n ,..., with (arbitrary) initial states iff ,tff n , 
... transforms the total initial state: 

S..+...+S_+0 S. S S_ n 
(2.18) ^ 1 n = <Jf V 2 ...t/f VP j 
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by rules 1 and 2, into the final state: 

(2.19) P 1 2 =2 ^ ^"^k^ > 

i,j,...,k 

1 J K L...,aj ,aj ,...,0^,...] 

where the memory sequence element a* characterizes the £ eigen- 

function, 0jj of the operator A r . Again the square amplitude measure 

for each element of the superposition (2.19) reduces to the product mea- 

S 2 

sure of the individual memory element measures, \(<f>£,<ft r )l for the 
memory sequence element aj. Therefore, the memory sequence of a typi- 
cal element of (2.19) has all the characteristics of a random sequence, 

S ^ 

with individual, independent (and now different), probabilities |(0p,^ r )| 

th 

for the r memory state. 

Finally, we can generalize to the case where several observations are 
allowed to be performed upon the same system. For example, if we permit 
the observation of a new quantity B, (eigenfunctions rj m , memory char- 
acterization /3j) upon the system S f for which A r has already been 
observed, then the state (2.19): 

(2.2<» p= ^ (sf>\^S..^\/ T )...^l/ n ) 
iX k 

*\..4' r 4l..*° a i fl r a n eJ 

is transformed by Rule 2 into the state: 

(2.2D p= ^ (<Al^ Sl )-(^/ r )-(^{;,/ n )(^,0p 

i,...,?,...,k,m 
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The relative system states for S have been changed from the eigenstates 
of A r , (<£•!, to the eigenstates of B r ,i^!. We notice further that, with 
respect to our measure on the superposition, the memory sequences still 
have the character of random sequences, but of random sequences for 
which the individual terms are no longer independent. The memory states 
f3 l m now depend upon the memory states ajj which represent the result of 
the previous measurement upon the same system, S f . The joint (normal- 
ized) measure for this pair of memory states, conditioned by fixed values 
for remaining memory states is: 

(2.22) M ^ ( a lPm>= ^ , 

£,m 

2l(^i/ 1 )-^£/ r )-(^ Sn X'»m^?| 2 

The joint measure (2.15) is, first of all, independent of the memory 
states for the remaining systems (Sj..^ excluding S f ). Second, the 
dependence of jS^ on ctg is equivalent, measure theoretically, to that 
given by the stochastic process 3 which converts the states c£g into the 
states r) T m with transition probabilities: 

(2.23) T £m = Prob.(0^4)= |(^)| 2 . 



Cf. Chapter II, §6. 
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If we were to allow yet another quantity C to be measured in S r , the 
new memory states corresponding to the eigenfunctions of C would 
have a similar dependence upon the previous states (5 T m , but no direct 
dependence on the still earlier states a|. This dependence upon only the 
previous result of observation is a consequence of the fact that the rela- 
tive system states are completely determined by the last observation. 

We can therefore summarize the situation for an arbitrary sequence of 
observations, upon the same or different systems in any order, and for 
which the number of observations of each quantity in each system is very 
large, with the following result: 

Except for a set of memory sequences of measure nearly zero, the 
averages of any functions over a memory sequence can be calculated 
approximately by the use of the independent probabilities given by Process 
1 for each initial observation, on a system, and by the use of the transi- 
tion probabilities (2.23) for succeeding observations upon the same system. 
In the limit, as the number of all types of observations goes to infinity the 
calculation is exact, and the exceptional set has measure zero. 

This prescription for the calculation of averages over memory sequen- 
ces by probabilities assigned to individual elements is precisely that of 
the orthodox theory (Process 1). Therefore all predictions of the usual 
theory will appear to be valid to the observer in almost all observer states, 
since these predictions hold for almost all memory sequences. 

In particular, the uncertainty principle is never violated, since, as 
above, the latest measurement upon a system supplies all possible infor- 
mation about the relative system state, so that there is no direct correla- 
tion between any earlier results of observation on the system, and the 
succeeding observation. Any observation of a quantity B, between two 
successive observations of quantity A (all on the same system) will 
destroy the one-one correspondence between the earlier and later memory 
states for the result of A. Thus for alternating observations of different 
quantities there are fundamental limitations upon the correlations between 
memory states for the same observed quantity, these limitations expressing 
the content of the uncertainty principle. 
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In conclusion, we have described in this section processes involving 
an idealized observer, processes which are entirely deterministic and con- 
tinuous from the over-all viewpoint (the total state function is presumed 
to satisfy a wave equation at all times) but whose result is a superposi- 
tion, each element of which describes the observer with a different memory 
state. We have seen that in almost all of these observer states it appears 
to the observer that the probabilistic aspects of the usual form of quantum 
theory are valid. We have thus seen how pure wave mechanics, without 
any initial probability assertions, can lead to these notions on a subjec- 
tive level, as appearances to observers. 

§3. Several observers 

We shall now consider the consequences of our scheme when several 
observers are allowed to interact with the same systems, as well as with 
one another (communication). In the following discussion observers shall 
be denoted by 1 ,0 2 ,..., other systems by S 1( S 2 ,..., and observables 
by operators A, B, C, with eigenf unctions 1^1, \r)-}, {£^1 respectively. 
The symbols aj, j8-, y^, occurring in memory sequences shall refer to 

o 

characteristics of the states <f>-, n-, respectively. (^. r , is inter- 

J u...,ajl 

preted as describing an observer, Oj, who has just observed the eigen- 
value corresponding to 0j, i.e., who is "aware" that the system is in 
state 0j.) 

We shall also wish to allow communication among the observers, which 
we view as an interaction by means of which the memory sequences of 
different observers become correlated. (For example, the transfer of im- 
pulses from the magnetic tape memory of one mechanical observer to that 
of another constitutes such a transfer of information.) 4 We shall regard 
these processes as observations made by one observer on another and 
shall use the notation that 



We assume that such transfers merely duplicate, but do not destroy, the origi- 
nal information. 
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represents a state function describing an observer Oj who has obtained 
the information ctj from another observer, O^. Thus the obtaining of in- 
formation about A from O l by 2 will transform the state 

Rules 1 and 2 are, of course, equally applicable to these interactions. We 
shall now illustrate the possibilities for several observers, by considering 
several cases. 

Case 1: We allow two observers to separately observe the same quantity 
in a system, and then compare results. 

We suppose that first observer Oj observes the quantity A for the 
system S. Then by Rule 1 the original state 

S+0,+0, o 0. 0, 

* 1 2 = 

is transformed into the state 

o.2) 2 wf^4Hl.., ai ]*il] • 

i 

We now suppose that 2 observes A, and by Rule 2 the state be- 
comes: 

(3.3) V- 2 <*i<^i ^[!..,a.]^ 2 ..,a i ] • 



into the state 
(3.1) 
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We now allow 2 to "consult" Oj, which leads in the same fashion 
from (3.1) and Rule 2 to the final state 

(3.4) V'= 2 (^^ S )^^:..,« i ]^if... (ai ,a i ° 1 ] • 

i 

Thus, for every element of the superposition the information obtained 
from Oj agrees with that obtained directly from the system. This means 
that observers who have separately observed the same quantity will a/ways 
agree with each other. 

Furthermore, it is obvious at this point that the same result, (4.4), is 
obtained if 0„ first consults O, , then performs the direct observation, 
except that_the memory sequence for 2 is reversed ([....cr^ ,ct^] instead 
of [....a^aj *]). There is still perfect agreement in every element of the 
superposition. Therefore, information obtained from another observer is 
always reliable, since subsequent direct observation will always verify it. 
We thus see the central role played by correlations in wave functions for 
the preservation of consistency in situations where several observers are 
allowed to consult one another. It is the transitivity of correlation in 
these cases (that if Sj is correlated to S 2 , and S 2 to S 3 , then so is 
Sj to S 2 ) which is responsible for this consistency. 

Case 2: We allow two observers to measure separately two different, non- 
commuting quantities in the same system. 

Assume that first 1 observes A for the system, so that, as before, 
the initial state t{/ S iff® 1 i[r° 2 is transformed to: 

(3.5) ^=2 (<A i'^^i^...«i^&]- 

i 

Next let 2 determine j8 for the system, where (77^ are the eigen- 
functions of /3. Then by application of Rule 2 the result is 
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0.6) r- 2 

2 is now perfectly correlated with the system, since a redetermination 
by him will lead to agreeing results. This is no longer the case for O x , 
however, since a redetermination of A by him will result in (by Rule 2) 

(3.7) £ tfi.^Olj^iX** ^lHl,^l\...,a iia ^ ■ 
ij.k 

Hence the second measurement of Oj does not in all cases agree 
with the first, and has been upset by the intervention of 2 . 

We can deduce the statistical relation between 's first and second 
results (aj and a^) by our previous method of assigning a measure to 
the elements of the superposition (3.7). The - measure assigned to the 
(i,j,k) element is then: 

(3.8) M ijk = \(<f> i ,<J> S )(r,j,<f> i H<t> k ,V i )\ 2 ■ 

This measure is equivalent, in this case, to the probabilities assigned by 
the orthodox theory (Process 1), where 2 's observation is regarded as 
having converted each state <£j into a non-interfering mixture of states 
7/j, weighted with probabilities lO/j, <£j)| 2 , upon which O l makes his 
second observation. 

Note, however, that this equivalence with the statistical results ob- 
tained by considering that 2 's observation changed the system state 
into a mixture, holds true only so long as Oj 's second observation is 
restricted to the system. If he were to attempt to simultaneously deter- 
mine a property of the system as well as of 2 , interference effects 
might become important. The description of the states relative to 1# 
after 2 's observation, as non-interfering mixtures is therefore incom- 
plete. 
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Case 3: We suppose that two systems S l and S 2 are correlated but no 

longer interacting, and that Oj measures property A in S lt and 2 
property /S in S 2 . 

We wish to see whether 2 's intervention with S 2 can in any way 
affect Oj's results in S 1# so that perhaps signals might be sent by 
these means. We shall assume that the initial state for the system pair is 

(3.9) / 1+S2 =2 a i^i 2 * 

i 

We now allow Oj to observe A in Sj , so that after this observa- 
tion the total state becomes: 

i 

Oj can of course continue to repeat the determination, obtaining the 
same result each time. 

We now suppose that 2 determines /3 in S 2 , which results in 

(3.ii) 2 A^L^H 2 ..,^ ■ 

i.j 

However, in this case, as distinct from Case 2, we see that the inter- 
vention of 2 in no way affects Oj's determinations, since Oj is 
still perfectly correlated to the states of S lf and any further obser- 

vations by Oj will lead to the same results as the earlier observations. 
Thus each memory sequence for Oj continues without change due to 
2 's observation, and such a scheme could not be used to send any 
signals. 

Furthermore, we see that the result (3.11) is arrived at even in the 
case that 2 should make his determination before that of Oj . There- 
fore any expectations for the outcome of : 's first observation are in no 
way affected by whether or not 2 performs his observation before that 
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of Oj . This is true because the expectation of the outcome for Oj can 
be computed from (4.10), which is the same whether or not 2 performs 
his measurement before or after 1 . 

It is therefore seen that one observer's observation upon one system 
of a correlated, but non-interacting pair of systems, has no effect on the 
remote system, in the sense that the outcome or expected outcome of any 
experiments by another observer on the remote system are not affected. 
Paradoxes like that of Einstein-Rosen-Podolsky 5 which are concerned 
with such correlated, non-interacting, systems are thus easily understood 
in the present scheme. 

Many further combinations of several observers and systems can be 
easily studied in the present framework, and all questions answered by 
first writing down the final state for the situation with the aid of the 
Rules 1 and 2, and then noticing the relations between the elements of 
the memory sequences. 



Einstein [8]. 



V. SUPPLEMENTARY TOPICS 



We have now completed the abstract treatment of measurement and 
observation, with the deduction that the statistical predictions of the 
usual form of quantum theory (Process 1) will appear to be valid to all 
observers. We have therefore succeeded in placing our theory in corre- 
spondence with experience, at least insofar as the ordinary theory cor- 
rectly represents experience. 

We should like to emphasize that this deduction was carried out by 
using only the principle of superposition, and the postulate that an obser- 
vation has the property that if the observed variable has a definite value 
in the object-system then it will remain definite and the observer will per- 
ceive this value. This treatment is therefore valid for any possible quan- 
tum interpretation of observation processes, i.e., any way in which one 
can interpret wave functions as describing observers, as well as for any 
form of quantum mechanics for which the superposition principle for states 
is maintained. Our abstract discussion of observation is therefore logi- 
cally complete, in the sense that our results for the subjective experience 
of observers are correct, if there are any observers at all describable by 
wave mechanics. 1 

In this chapter we shall consider a number of diverse topics from the 
point of view of our pure wave mechanics, in order to supplement the ab- 
stract discussion and give a feeling for the new viewpoint. Since we are 
now mainly interested in elucidating the reasonableness of the theory, we 
shall often restrict ourselves to plausibility arguments, rather than de- 
tailed proofs. 



They are, of course, vacuously correct otherwise. 
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§1. Macroscopic objects and classical mechanics 

In the light of our knowledge about the atomic constitution of matter, 
any "object" of macroscopic size is composed of an enormous number of 
constituent particles. The wave function for such an object is then in a 
space of fantastically high dimension (3N, if N is the number of parti- 
cles). Our present problem is to understand the existence of macroscopic 
objects, and to relate their ordinary (classical) behavior in the three di- 
mensional world to the underlying wave mechanics in the higher dimension- 
al space. 

Let us begin by considering a relatively simple case. Suppose that 
we place in a box an electron and a proton, each in a definite momentum 
state, so that the position amplitude density of each is uniform over the 
whole box. After a time we would expect a hydrogen atom in the ground 
state to form, with ensuing radiation. We notice, however, that the posi- 
tion amplitude density of each particle is still uniform over the whole box. 
Nevertheless the amplitude distributions are now no longer independent, 
but correlated. In particular, the conditional amplitude density for the 
electron, conditioned by any definite proton (or centroid) position, is not 
uniform, but is given by the familiar ground state wave function for the 
hydrogen atom. What we mean by the statement, "a hydrogen atom has 
formed in the box," is just that this correlation has taken place — a corre- 
lation which insures that the relative configuration for the electron, for a 
definite proton position, conforms to the customary ground state configura- 
tion. 

The wave function for the hydrogen atom can be represented as a 
product of a centroid wave function and a wave function over relative 
coordinates, where the centroid wave function obeys the wave equation 
for a particle with mass equal to the total mass of the proton-electron sys- 
tem. Therefore, if we now open our box, the centroid wave function will 
spread with time in the usual manner of wave packets, to eventually occu- 
py a vast region of space. The relative configuration (described by the 
relative coordinate state function) has, however, a permanent nature, since 
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it represents a bound state, and it is this relative configuration which we 
usually think of as the object called the hydrogen atom. Therefore, no 
matter how indefinite the positions of the individual particles become in 
the total state function (due to the spreading of the centroid), this state 
can be regarded as giving (through the centroid wave function) an ampli- 
tude distribution over a comparatively definite object, the tightly bound 
electron-proton system. The general state, then, does not describe any 
single such definite object, but a superposition of such cases with the 
object located at different positions. 

In a similar fashion larger and more complex objects can be built up 
through strong correlations which bind together the constituent particles. 
It is still true that the general state function for such a system may lead 
to marginal position densities for any single particle (or centroid) which 
extend over large regions of space. Nevertheless we can speak of the 
existence of a relatively definite object, since the specification of a 
single position for a particle, or the centroid, leads to the case where the 
relative position densities of the remaining particles are distributed 
closely about the specified one, in a manner forming the comparatively 
definite object spoken of. 

Suppose, for example, we begin with a cannonball located at the origin, 
described by a state function: 

^(0,0.0)] ' 

where the subscript indicates that the total state function if/ describes a 
system of particles bound together so as to form an object of the size and 
shape of a cannonball, whose centroid is located (approximately) at the 
origin, say in the form of a real gaussian wave packet of small dimensions, 
with variance for each dimension. 

If we now allow a long lapse of time, the centroid of the system will 
spread in the usual manner to occupy a large region of space. (The spread 
in each dimension after time t will be given by ff t = o Q + (li t /4a m ), 
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where m is the mass.) Nevertheless, for any specified centroid position, 
the particles, since they remain in bound states, have distributions which 
again correspond to the fairly well defined size and shape of the cannon- 
ball. Thus the total state can be regarded as a (continuous) superposition 
of states /» 



(x,y, z). The coefficients a xyz of the superposition then correspond to 
the centroid distribution. 

It is not true that each individual particle spreads independently of 
the rest, in which case we would have a final state which is a grand super- 
position of states in which the particles are located independently every- 
where. The fact that they are in bound states restricts our final state to 
a superposition of "cannonball" states. The wave function for the cen- 
troid can therefore be taken as a representative wave function for the 
whole object. 

It is thus in this sense of correlations between constituent particles 
that definite macroscopic objects can exist within the framework of pure 
wave mechanics. The building up of correlations in a complex system 
supplies us with a mechanism which also allows us to understand how 
condensation phenomena (the formation of spatial boundaries which sepa- 
rate phases of different physical or chemical properties) can be controlled 
by the wave equation, answering a point raised by Schrodinger 

Classical mechanics, also, enters our scheme in the form of correla- 
tion laws. Let us consider a system of objects (in the previous sense), 
such that the centroid of each object has initially a fairly well defined 
position and momentum (e.g., let the wave function for the centroids con- 
sist of a product of gaussian wave packets). As time progresses, the 




each of which (ft. 



[cXx.y.z)] 



,) describes a cannonball at the position 
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centers of the square amplitude distributions for the objects will move in 
a manner approximately obeying the laws of motion of classical mechanics, 
with the degree of approximation depending upon the masses and the 
length of time considered, as is well known. (Note that we do not mean 
to imply that the wave packets of the individual objects remain indepen- 
dent if they are interacting. They do not. The motion that we refer to is 
that of the centers of the marginal distributions for the centroids of the 
bodies.) 

The general state of a system of macroscopic objects does not, how- 
ever, ascribe any nearly definite positions and momenta to the individual 
bodies. Nevertheless, any general state can at any instant be analyzed 
into a superposition of states each of which does represent the bodies 
with fairly well defined positions and momenta. Each of these states 
then propagates approximately according to classical laws, so that the 
general state can be viewed as a superposition of quasi-classical states 
propagating according to nearly classical trajectories. In other words, if 
the masses are large or the time short, there will be strong correlations 
between the initial (approximate) positions and momenta and those at a 
later time, with the dependence being given approximately by classical 
mechanics. 

Since large scale objects obeying classical laws have a place in our 
theory of pure wave mechanics, we have justified the introduction of 



For any E one can construct a complete orthonormal set of (one particle) 
states (f)^ v , where the double index \l,v refers to the approximate position and 
momentum, and for which the expected position and momentum values run indepen- 
dently through sets of approximately uniform density, such that the position and 

momentum uncertainties, a and a. satisfy <7 S CE and a S C — - for each 

x p 'x p 2E 

<f>p v , where C is a constant ~ 60. The uncertainty product then satisfies 

O a ^ C2"?L, about 3,600 times the minimum allowable, but still sufficiently low 
x p 2 

for macroscopic objects. This set can then be used as a basis for our decomposi- 
tion into states where every body has a roughly defined position and momentum. 
For a more complete discussion of this set see von Neumann [17], pp. 406-407, 
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models for observers consisting of classically describable, automatically 
functioning machinery, and the treatment of observation of Chapter IV is 
non- vacuous. 

Let us now consider the result of an observation (considered along 
the lines of Chapter IV) performed upon a system of macroscopic bodies 
in a general state. The observer will nor become aware of the fact that 
the state does not correspond to definite positions and momenta (i.e., he 
will not see the objects as "smeared out" over large regions of space) 
but will himself simply become correlated with the system — after the ob- 
servation the composite system of objects + observer will be in a super- 
position of states, each element of which describes an observer who has 
perceived that the objects have nearly definite positions and momenta, 
and for whom the relative system state is a quasi-classical state in the 
previous sense, and furthermore to whom the system will appear to behave 
according to classical mechanics if his observation is continued. We see, 
therefore, how the classical appearance of the macroscopic world to us 
can be explained in the wave theory. 

§2. Amplification processes 

In Chapter III and IV we discussed abstract measuring processes, 
which were considered to be simply a direct coupling between two sys- 
tems, the object-system and the apparatus (or observer). There is, how- 
ever, in actuality a whole chain of intervening systems linking a micro- 
scopic system to a macroscopic observer. Each link in the chain of inter- 
vening systems becomes correlated to its predecessor, so that the result 
is an amplification of effects from the microscopic object-system to a 
macroscopic apparatus, and then to the observer. 

The amplification process depends upon the ability of the state of one 
micro-system (particle, for example) to become correlated with the states 
of an enormous number of other microscopic systems, the totality of which 
we shall call a detection system. For example, the totality of gas atoms 
in a Geiger counter, or the water molecules in a cloud chamber, constitute 
such a detection system. 
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The amplification is accomplished by arranging the condition of the 
detection system so that the states of the individual micro-systems of the 
detector are metas table, in a way that if one micro-system should fall from 
its metastable state it would influence the reduction of others. This type 
of arrangement leaves the entire detection system metastable against 
chain reactions which involve a large number of its constituent systems. 
In a Geiger counter, for example, the presence of a strong electric field 
leaves the gas atoms metastable against ionization. Furthermore, the 
products of the ionization of one gas atom in a Geiger counter can cause 
further ionizations, in a cascading process. The operation of cloud cham- 
bers and photographic films is also due to metastability against such 
chain reactions. 

The chain reactions cause large numbers of the micro-systems of the 
detector to behave as a unit, all remaining in the metastable state, or all 
discharging. In this manner the states of a sufficiently large number of 
micro-systems are correlated, so that one can speak of the whole ensemble 
being in a state of discharge, or not. 

For example, there are essentially only two macroscopically distin- 
guishable states for a Geiger counter; discharged or undischarged. The 
correlation of large numbers of gas atoms, due to the chain reaction effect, 
implies that either very few, or else very many of the gas atoms are ionized 
at a given time. Consider the complete state function \J/ of a Geiger 
counter, which is a function of all the coordinates of all of the constituent 
particles. Because of the correlation of the behavior of a large number of 
the constituent gas atoms, the total state tfi can always be written as 
a superposition of two states 

(2.1) * G -«l*[u] +a 2*&>] , 

where tjj ^ signifies a state where only a small number of gas atoms 
are ionized, and ^? nl a state for which a large number are ionized. 
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To see that the decomposition (2.1) is valid, expand iff in terms of 
individual gas atom stationary states: 

(2.2) ^ G = ^ a ij...k4V- 2 ..^k n . 

i,j,...,k 

S 1L 

where if/p is the I state of atom r. Each element of the superposi- 
tion (2.2) 

(2.3) ^Vj 2 -^ k n 

must contain either a very large number of atoms in ionized states, or else 
a very small number, because of the chain reaction effect. By choosing 
some medium-sized number as a dividing line, each element of (2.2) can 
be placed in one of the two categories, high number of low number of 
ionized atoms. If we then carry out the sum (2.2) over only those elements 
of the first category, we get a state (and coefficient) 

(2.4) ai ^ D] = 2'a ij ... k *fV^...*k n • 

ij...k 

The state ^jj-j] * s then a state where a large number of particles are 
ionized. The subscript [D] indicates that it describes a Geiger counter 
which has discharged. If we carry out the sum over the remaining terms 
of (2.2) we get in a similar fashion: 

(2.5) a 2 ^ [ 2 u] = 2" a ij...k^Vj S2 -^k n 

ij...k 

where [U] indicates the undischarged condition. Combining (2.4) and 
(2.5) we arrive at the desired relation (2.1). So far, this method of decom- 
position can be applied to any system, whether or not it has the chain re- 
action property. However, in our case, more is implied, namely that the 
spread of the number of ionized atoms in both ^^p] an( * ^[U] ^ G 
small compared to the separation of their averages, due to the fact that 
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the existence of the chain reactions means that either many or else few 
atoms will be ionized, with the middle ground virtually excluded. 

This type of decomposition is also applicable to all other detection 
devices which are based upon this chain reaction principle (such as cloud 
chambers, photo plates, etc.). 

We consider now the coupling of such a detection device to another 
micro-system (object-system) for the purpose of measurement. If it is true 
that the initial object-system state <f>y will at some time t trigger the 
chain reaction, so that the state of the counter becomes ^[rj]' w ^^ e 
object-system state <f> 2 will not, then it is still true that the initial 
object-system state a l <f> 1 + a 2 <£ 2 W *H result in the superposition 

(2.6) a l«^[D] + a 2^[u] 

at time t. 

For example, let us suppose that a particle whose state is a wave 
packet <f>, of linear extension greater than that of our Geiger counter, 
approaches the counter. Just before it reaches the counter, it can be de- 
composed into a superposition = ajS^ + a 2^2 ^l ,( ^2 ort hogonal) 
where <f) 1 has non-zero amplitude only in the region before the counter 
and <f) 2 has non-zero amplitude elsewhere (so that <f> l is a packet which 
will entirely pass through the counter while 2 will entirely miss the 
counter). The initial total state for the system particle + counter is then: 

^[U] = (a ^ 1 + a 2*2^[u] ' 

where ^j-yj is the initial (assumed to be undischarged) state of the 
counter. 

But at a slightly later time <^> 1 is changed to <f>\ , after traversing 
the counter and causing it to go into a discharged state ^pjt while <f> 2 
passes by into a state 2 leaving the counter in an undischarged state 
^r IT v Superposing these results, the total state at the later time is 
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(2.7) ai«^[ D ] + a 2^[u] 

in accordance with (2.6). Furthermore, the relative particle state for 
^[D]' * S 3 wave P ac ket emanating from the counter, while the rela- 
tive state for ^^j] i s a wave with a "shadow" cast by the counter. The 
counter therefore serves as an apparatus which performs an approximate 
position measurement on the particle. 

No matter what the complexity or exact mechanism of a measuring 
process, the general superposition principle as stated in Chapter HI, §3, 
remains valid, and our abstract discussion is unaffected. It is a vain hope 
that somewhere embedded in the intricacy of the amplification process is 
a mechanism which will somehow prevent the macroscopic apparatus state 
from reflecting the same indefiniteness as its object-system. 

§3. Reversibility and irreversibility 

Let us return, for the moment, to the probabilistic interpretation of 
quantum mechanics based on Process 1 as well as Process 2. Suppose 
that we have a large number of identical systems (ensemble), and that the 

system is in the state t/rK Then for purposes of calculating expecta- 
tion values for operators over the ensemble, the ensemble is represented 
by the mixture of states iff* weighted with 1/N, where N is the number 
of systems, for which the density operator is: 

(3-D P - I 2 ' 

j 

where denotes the projection operator on ifrK This density operator, 

in turn, is equivalent to a density operator which is a sum of projections 
on orthogonal states (the eigenstates of p): 4 



Cf. Chapter III, §1. 

See Chapter III, §2, particularly footnote 6, p. 46. 
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(3.2) P = 2 P i bi 1 ' (7? i' V = S ij' 2 P i = 1 ' 

i i 
so that any ensemble is always equivalent to a mixture of orthogonal 
states, which representation we shall henceforth assume. 

Suppose that a quantity A, with (non-degenerate) eigenstates \<f>-\ 
is measured in each system of the ensemble. This measurement has the 
effect of transforming each state 7^ into the state <py with probability 
|(^j,7/j)| 2 ; i.e., it will transform a large ensemble of systems in the state 
?7j into an ensemble represented by the mixture whose density operator is 
K^j'^i)! 2 t^jl- Extending this result to the case where the original 
j 

ensemble is a mixture of the i/j weighted by Pj ((3.2)), we find that the 
density operator p is transformed by the measurement of A into the new 
density operator p": 

0.3) p-= x?i sKwi^feji-s (sPi^i-wy^ 

= 2(^ 2 = 2 <v W • 

j V i ' j 

This is the general law by which mixtures change through Process 1. 

However, even when no measurements are taking place, the states of 
an ensemble are changing according to Process 2, so that after a time 
interval t each state if/ will be transformed into a state t^'= U^, 
where U t is a unitary operator. This natural motion has the consequence 

that each mixture P = 2 P i^i^ * s cart i ec ^ * nto ^ e mixture P' = 2 ^i^t 7 ^ 

i i 
after a time t. But for every state f , 

(3.4) p'£ = ^PilV^U = 2 PjCU^f )U t „ 

i i 

= Ut^Pi^i'V 1 ^ - u^Pi^Ku^) 

i i 
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Therefore 

(3.5) p'= U t pU t - x , 

which is the general law for the change of a mixture according to Process 2. 

We are now interested in whether or not we get from any mixture to 
another by means of these two processes, i.e., if for any pair p,p', there 
exist quantities A which can be measured and unitary (time dependence) 
operators U such that p can be transformed into p' by suitable appli- 
cations of Processes 1 and 2. We shall see that this is not always possi- 
ble, and that Process 1 can cause irreversible changes in mixtures. 

For each mixture p we define a quantity l p : 

(3.6) I p = Trace (p In p) . 

This number, Ip, has the character of information. If P = ^ l P- l iv^> a 

i 

mixture of orthogonal states rj i weighted with Pj, then l p is simply 
the information of the distribution P i over the eigenstates of p (relative 
to the uniform measure). (Trace (p In p) is a unitary invariant and is 
proportional to the negative of the entropy of the mixture, as discussed in 
Chapter III, §2.) 

Process 2 therefore has the property that it leaves l p unchanged, 
because 

(3.7) Ip- = Trace (p' In p') = Trace (U t p Uf 1 In U t p U^ 1 ) 

= Trace (U t p In pU^" 1 ) = Trace (p In p) = I p . 

Process 1, on the other hand, can decrease Ip but never increase it. 
According to (3.3): 



(3.8) p'= ^ (fyP^W = £ P i ^i'^l 2 ^j ] = % P W ' 
j i.j j 
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where pj ^ p i T ij and T ij = Ify' 4>$\ 2 is a doubly-stochastic 
i 

matrix. 5 But I p '=2 P j lnP j and l p = 2 P i ln P i' with the P i' P j 

j 1 
connected by implies, by the theorem of information decrease for 

stochastic processes (II-§6), that: 

(3.9) lp- ± I p • 

Moreover, it can easily be shown by a slight strengthening of the theorems 
of Chapter II, §6 that strict inequality must hold unless (for each i such 
that pj > 0) Tjj = 1 for one j and for the rest (T £j - = 5 ik j). This 
means that lO^,^)! 2 = S^y which implies that the original mixture was 
already a mixture of eigenstates of the measurement. 

We have answered our question, and it is nor possible to get from any 
mixture to another by means of Processes 1 and 2. There is an essential 
irreversibility to Process 1, since it corresponds to a stochastic process, 
which cannot be compensated by Process 2, which is reversible, like 
classical mechanics. 6 

Our theory of pure wave mechanics, to which we now return, must give 
equivalent results on the subjective level, since it leads to Process 1 
there. Therefore, measuring processes will appear to be irreversible to 
any observers (even though the composite system including the observer 
changes its state reversibly). 



5 Since 2 T y = 2 |(r,., cf>.)\ 2 = 2 (<£.. [r,.^.) = (rf> y J [t,.]0.) = (0.. 1^) = 1. 
and similarly 2 T.. = 1 because T.. is symmetric. 

6 For another, more complete, discussion of this topic in the probabilistic in^ 
terpretation see von Neumann [17], Chapter V, §4. 
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There is another way of looking at this apparent irreversibility within 
our theory which recognizes only Process 2. When an observer performs 
an observation the result is a superposition, each element of which de- 
scribes an observer who has perceived a particular value. From this time 
forward there is no interaction between the separate elements of the super- 
position (which describe the observer as having perceived different results), 
since each element separately continues to obey the wave equation. Each 
observer described by a particular element of the superposition behaves 
in the future completely independently of any events in the remaining ele- 
ments, and he can no longer obtain any information whatsoever concerning 
these other elements (they are completely unobservable to him). 

The irreversibility of the measuring process is therefore, within our 
framework, simply a subjective manifestation reflecting the fact that in 
observation processes the state of the observer is transformed into a 
superposition of observer states, each element of which describes an ob- 
server who is irrevocably cut off from the remaining elements. While it is 
conceivable that some outside agency could reverse the total wave func- 
tion, such a change cannot be brought about by any observer which is 
represented by a single element of a superposition, since he is entirely 
powerless to have any influence on any other elements. 

There are, therefore, fundamental restrictions to the knowledge that 
an observer can obtain about the state of the universe. It is impossible 
for any observer to discover the total state function of any physical sys- 
tem, since the process of observation itself leaves no independent state 
for the system or the observer, but only a composite system state in which 
the object-system states are inextricably bound up with the observer states. 
As soon as the observation is performed, the composite state is split into 
a superposition for which each element describes a different object-system 
state and an observer with (different) knowledge of it. Only the totality 
of these observer states, with their diverse knowledge, contains complete 
information about the original object-system state - but there is no possi- 
ble communication between the observers described by these separate 



THEORY OF THE UNIVERSAL WAVE FUNCTION 



99 



states. Any single observer can therefore possess knowledge only of the 
relative state function (relative to his state) of any systems, which is in 
any case all that is of any importance to him. 

We conclude this section by commenting on another question which 
might be raised concerning irreversible processes: Is it necessary for 
the existence of measuring apparata, which can be correlated to other 
systems, to have frictional processes which involve systems of a large 
number of degrees of freedom? Are such thermodynamically irreversible 
processes possible in the framework of pure wave mechanics with a re- 
versible wave equation, and if so, does this circumstance pose any diffi- 
culties for our treatment of measuring processes? 

In the first place, it is certainly not necessary for dissipative proces- 
ses involving additional degrees of freedom to be present before an inter- 
action which correlates an apparatus to an object-system can take place. 
The counter-example is supplied by the simplified measuring process of 
III- §3, which involves only a system of one coordinate and an apparatus 
of one coordinate and no further degrees of freedom. 

To the question whether such processes are possible within reversi- 
ble wave mechanics, we answer yes, in the same sense that they are 
present in classical mechanics, where the microscopic equations of motion 
are also reversible. This type of irreversibility, which might be called 
macroscopic irreversibility, arises from a failure to separate "macroscopi- 
cally indistinguishable" states into "true" microscopic states. It has a 
fundamentally different character from the irreversibility of Process 1, 
which applies to micro-states as well and is peculiar to quantum mechan- 
ics. Macroscopically irreversible phenomena are common to both classical 

and quantum mechanics, since they arise from our incomplete information 

g 

concerning a system, not from any intrinsic behavior of the system. 



7 
8 



See any textbook on statistical mechanics, such as ter Haar [ll], Appendix I. 
Cf. the discussion of Chapter II, §7. See also von Neumann [l7], Chapter V, §4. 



100 



HUGH EVERETT, III 



Finally, even when such frictional processes are involved, they pre- 
sent no new difficulties for the treatment of measuring and observation 
processes given here. We imposed no restrictions on the complexity or 
number of degrees of freedom of measuring apparatus or observers, and if 
any of these processes are present (such as heat reservoirs, etc.) then 
these systems are to be simply included as part of the apparatus or ob- 
server. 

§4. Approximate measurement 

A phenomenon which is difficult to understand within the framework 
of the probabilistic interpretation of quantum mechanics is the result of 
an approximate measurement. In the abstract formulation of the usual 
theory there are two fundamental processes; the discontinuous, probabilis- 
tic Process 1 corresponding to precise measurement, and the continuous, 
deterministic Process 2 corresponding to absence of any measurement. 
What mixture of probability and causality are we to apply to the case 
where only an approximate measurement is effected (i.e., where the appa- 
ratus or observer interacts only weakly and for a finite time with the 
object-system)? 

In the case of approximate measurement, we need to be supplied with 
rules which will tell us, for any initial object-system state, first, with 
what probability can we expect the various possible apparatus readings, 
and second, what new state to ascribe to the system after the value has 
been observed. We shall see that it is generally impossible to give these 
rules within a framework which considers the apparatus or observer as 
performing an (abstract) observation subject to Process 1, and that it is 
necessary, in order to give a full account of approximate measurements, 
to treat the entire system, including apparatus or observer, wave mechan- 
ically. 

The position that an approximate measurement results in the situation 
that the object-system state is changed into an eigenstate of the exact 
measurement, but for which particular one the observer has only imprecise 
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information, is manifestly false. It is a fact that we can make successive 
approximate position measurements of particles (in cloud chambers, for 
example) and use the results for somewhat reliable predictions of future 
positions. However, if either of these measurements left the particle in 
an "eigenstate" of position ( 8 function), even though the particular one 
remained unknown, the momentum would have such a variance that no such 
prediction would be possible. (The possibility of such predictions lies in 
the correlations between position and momentum at one time with position 
and momentum at a later time for wave packets 9 - correlations which are 
totally destroyed by precise measurements of either quantity.) 

Instead of continuing the discussion of the inadequacy of the proba- 
bilistic formulation, let us first investigate what actually happens in 
approximate measurements, from the viewpoint of pure wave mechanics. 
An approximate measurement consists of an interaction, for a finite time, 
which only imperfectly correlates the apparatus (or observer) with the 
object-system. We can deduce the desired rules in any particular case by 
the following method: For fixed interaction and initial apparatus state 
and for any initial object-system state we solve the wave equation for the 
time of interaction in question. The result will be a superposition of 
apparatus (observer) states and relative object-system states. Then 
(according to the method of Chapter IV for assigning a measure to a super- 
position) we assign a probability to each observed result equal to the 
square-amplitude of the coefficient of the element which contains the 
apparatus (observer) state representing the registering of that result. 
Finally, the object-system is assigned the new state which is its relative 
state in that element. 

For example, let us consider the measuring process described in Chap- 
ter III- §3, which is an excellent model for an approximate measurement. 
After the interaction, the total state was found to be (HI -(3. 12)): 



See Bohm [l], p. 202. 
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(4.1) ^ A = Jji- £ r (q)S(r-r')dr' . 

Then, according to our prescription, we assign the probability density 
P(r') to the observation of the apparatus coordinate x 



(4.2) P(r') = 



M 2 - f 

N r'l J 



^Vfo^Vr'-qOdq , 



which is the square amplitude of the coefficient °* ^ e element 

£ r (q)S(r— O of the superposition (4.1) in which the apparatus coordinate 
has the value r = r". Then, depending upon the observed apparatus coordi- 
nate r', we assign the object-system the new state 

(4.3) f r (q) = N t ^(q)7;(r'-qt) 

(where <£(q) is the old state, and ry(r) is the initial apparatus state) 
which is the relative object-system state in (4.1) for apparatus coordinate r'. 

This example supplies the counter-example to another conceivable 
method of dealing with approximate measurement within the framework of 
Process 1. This is the position that when an approximate measurement 
of a quantity Q is performed, in actuality another quantity Q' is pre- 
cisely measured, where the eigenstates of Q' cbrrespond to fairly well- 
defined (i.e., sharply peaked distributions for) Q values. 10 However, 
any such scheme based on Process 1 always has the prescription that 
after the measurement, the (unnormalized) new state function results from 
the old by a projection (on an eigenstate or eigenspace), which depends 
upon the observed value. If this is true, then in the above example the 
new state £ r (q) must result from the old, 0(q), by a projection E: 

(4.4) f r '(q) = N E 0(q) = N r , 0(q) r?(r'- qt) 



10 Cf. von Neumann [l7], Chapter IV, §4. 
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where N, N f ' are normalization constants). But E is only a projection 
if E 2 = E. Applying the operation (4.4) twice, we get: 

(4.5) E(NE0(q)) = NE 2 <£(q) = N'0(q)7, 2 (r'- qt) => E 2 0(q) 

= gVcq^V-qt) t 

and we see that E cannot be a projection unless jj(q) = 7/ (q) for all 
q (i.e., 7/(q) = or 1 for all q) and we have arrived at a contradiction 
to the assumption that in all cases the changes of states for approximate 
measurements are governed by projections. (In certain special cases, 
such as approximate position measurements with slits or Geiger counters, 11 
the new functions arise from the old by multiplication by sharp cutoff 
functions which are 1 over the slit or counter and elsewhere, so that 
these measurements can be handled by projections.) 

One cannot, therefore, account for approximate measurements by any 
scheme based on Process 1, and it is necessary to investigate these pro- 
cesses entirely wave-mechanically. Our viewpoint constitutes a frame- 
work in which it is possible to make precise deductions about such mea- 
surements and observations, since we can follow in detail the interaction 
of an observer or apparatus with an object-system. 

§5. Discussion of a spin measurement example 

We shall conclude this chapter with a discussion of an instructive 
example of Bohm. 12 Bohm considers the measurement of the z component 
of the angular momentum of an atom, whose total angular momentum is j, 
which is brought about by a Stern-Gerlach experiment. The measurement 



11 Cf. §2, this chapter. 

12 Bohm [l], p. 593. 
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is accomplished by passing an atomic beam through an inhomogeneous 
magnetic field, which has the effect of giving the particle a momentum 
which is directed up or down depending upon whether the spin was up or 
down. 

The measurement is treated as impulsive, so that during the time that 
the atom passes through the field the Hamiltonian is taken to be simply 
the interaction: 

(5.1) H! = fi(8-K), f = ~^ 

-» 

where K is the magnetic field and S the spin operator for the atom. The 
particle is presumed to pass through a region of the field where the field 
is in the z direction, so that during the time of transit the field is 
approximately H z as M + zKo (^i Q = (K z )^ ^ and = )' and 

hence the interaction is approximately: 

(5.2) Hj a ,i(K + zK' )S z , 

where S z denotes the operator for the z component of the spin. 

It is assumed that the state of the atom, just prior to entry into the 
field, is a wave packet of the form: 

(5.3) ^ = f (z)(c + v + + c_v_) 

where v + and v_ are the spin functions for S z = 1 and —1 respec- 
tively. Solving the Schrodinger equation for the Hamiltonian (5.2) and 
initial condition (5.3) yields the state for a later time t : 

/ -in(H +zK')t/1i +i/x(H +zH')t/1i \ 

(5.4) ^=f (z)(c + e ^ v + + c_e v_ ) . 

13 

Therefore, if At is the time that it takes the atom to traverse the field, 
each component of the wave packet has been multiplied by a phase factor 



This time is, strictly speaking, not well defined. The results, however, do 
not depend critically upon it. 
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ii^JL+zJOAt/li 

e , i.e., has had its mean momentum in the z direction 

changed by an amount tH^t At, depending upon the spin direction. Thus 
the initial wave packet (with mean momentum zero) is split into a super- 
position of two packets, one with mean z-momentum +H Q fiAt and spin 
up, and the other with spin down and mean z-momentum — K Q fiAt. 

The interaction (5.2) has therefore served to correlate the spin with 
the momentum in the z-direction. These two packets of the resulting 
superposition now move in opposite z-directions, so that after a short 
time they become widely separated (provided that the momentum changes 
±KqH At are large compared to the momentum spread of the original 
packet), and the z-coordinate is itself then correlated with the spin — 
representing the "apparatus" coordinate in this case. The Stern-Gerlach 
apparatus therefore splits an incoming wave packet into a superposition 
of two diverging packets, corresponding to the two spin values. 

We take this opportunity to caution against a certain viewpoint which 
can lead to difficulties. This is the idea that, after an apparatus has 
interacted with a system, in "actuality" one or another of the elements 
of the resultant superposition described by the composite state-function 
has been realized to the exclusion of the rest, the existing one simply 
being unknown to an external observer (i.e., that instead of the super- 
position there is a genuine mixture). This position must be erroneous 
since there is always the possibility for the external observer to make 
use of interference properties between the elements of the superposition. 

In the present example, for instance, it is in principle possible to de- 
flect the two beams back toward one another with magnetic fields and re- 
combine them in another inhomogeneous field, which duplicates the first, 
in such a manner that the original spin state (before entering the appa- 
ratus) is restored. 14 This would not be possible if the original Stern- 
Gerlach apparatus performed the function of converting the original wave 



14 As pointed out by Bohm [l], p. 604. 
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packet into a non-interfering mixture of packets for the two spin cases. 
Therefore the position that after the atom has passed through the inhomo- 
geneous field it is "really" in one or the other beam with the correspond- 
ing spin, although we are ignorant of which one, is incorrect. 

After two systems have interacted and become correlated it is true 
that marginal expectations for subsystem operators can be calculated 
correctly when the composite system is represented by a certain non- 
interfering mixture of states. Thus if the composite system state is 



poses of calculating the expectations of operators on Sj the state 



weighted by Pj = a*aj, and one can take the picture that one or another 
S S 

of the cases ^j 1 ^ 2 has been realized to the exclusion of the rest, with 
probabilities Pj. 15 

However, this representation by a mixture must be regarded as only a 
mathematical artifice which, although useful in many cases, is an incom- 
plete description because it ignores phase relations between the separate 
elements which actually exist, and which become important in any inter- 
actions which involve more than just a subsystem. 

In the present example, the "composite system" is made of the "sub- 
systems" spin value (object-system) and z-coordinate (apparatus), and 
the superposition of the two diverging wave packets is the state after 
interaction. It is only correct to regard this state as a mixture so long as 
any contemplated future interactions or measurements will involve only 
the spin value or only the z-coordinate, but not both simultaneously. As 
we saw, phase relations between the two packets are present and become 
important when they are deflected back and recombined in another inhomo- 
geneous field — a process involving the spin values and z-coordinate 
simultaneously. 



See Chapter III, §1. 




where the {17=! are orthogonal, then for pur- 



Si+S 2 



is equivalent to the non-interfering mixture of states (f>- t r) i 
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It is therefore improper to attribute any less validity or "reality" to 
any element of a superposition than any other element, due to this ever 
present possibility of obtaining interference effects between the elements. 
All elements of a superposition must be regarded as simultaneously 
existing. 

At this time we should like to add a few remarks concerning the notion 
of transition probabilities in quantum mechanics. Often one considers a 
system, with Hamiltonian H and stationary states l^jl, to be perturbed 
for a time by a time-dependent addition to the Hamiltonian, Hj(t). Then 
under the action of the perturbed Hamiltonian H' = H + Hj(t) the states 
{^j} are generally no longer stationary but change after time t into new 
states {^j(t)l: 

(5.5) 0. - ^(t) = 2 ^(t))^ = ^ aij(t)0j . 

j j 

which can be represented as a superposition of the old stationary states 
with time-dependent coefficients ajj(t). 

If at time r a measurement with eigenstates <f>-^ is performed, such 
as an energy measurement (whose operator is the original H ), then 
according to the probabilistic interpretation the probability for finding the 
state <£j, given that the state was originally <£j, is PyO - ) = |ajj(r)| 2 . 
The quantities |a|j(r)| 2 are often referred to as transition probabilities. 
In this case, however, the name is a misnomer, since it carries the conno- 
tation that the original state <f>^ is transformed into a mixture (of the <f>- } 
weighted by Pjj(r)), a nd gives the erroneous impression that the quantum 
formalism itself implies the existence of quantum-jumps (stochastic pro- 
cesses) independent of acts of observation. This is incorrect since there 
is still a pure state ^ajjCO^j with phase relations between the <f>y 
j 

and expectations of operators other than the energy must be calculated 
from the superposition and not the mixture. 

There is another case, however, the one usually encountered in fact, 
where the transition probability concept is somewhat more justified. This 
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is the case in which the perturbation is due to interaction of the system 
Sj with another system s 2 , and not simply a time dependence of Sj's 
Hamiltonian as in the case just considered. In this situation the interac- 
tion produces a composite system state, for which there are in general no 
independent subsystem states. However, as we have seen, for purposes 
of calculating expectations of operators on Sj alone, we can regard Sj 
as being represented by a certain mixture. According to this picture the 
states of subsystem Sj are gradually converted into mixtures by the 
interaction with s 2 and the concept of transition probability makes some 
sense. Of course, it must be remembered that this picture is only justi- 
fied so long as further measurements on s x alone are contemplated, and 
any attempt to make a simultaneous determination in s x and s 2 involves 
the composite state where interference properties may be important. 

An example is a hydrogen atom interacting with the electromagnetic 
field. After a time of interaction we can picture the atom as being in a 
mixture of its states, so long as we consider future measurements on the 
atom only. But in actuality the state of the atom is dependent upon 
(correlated with) the state of the field, and some process involving both 
atom and field could conceivably depend on interference effects between 
the states of the alleged mixture. With these restrictions, however, the 
concept of transition probability is quite useful and justified. 



VI. DISCUSSION 



We have shown that our theory based on pure wave mechanics, which 
takes as the basic description of physical systems the state function — 
supposed to be an objective description (i.e., in one-one, rather than 
statistical, correspondence to the behavior of the system) — can be put in 
satisfactory correspondence with experience. We saw that the probabilis- 
tic assertions of the usual interpretation of quantum mechanics can be 
deduced from this theory, in a manner analogous to the methods of classi- 
cal statistical mechanics, as subjective appearances to observers — 
observers which were regarded simply as physical systems subject to the 
same type of description and laws as any other systems, and having no 
preferred position. The theory is therefore capable of supplying us with 
a complete conceptual model of the universe, consistent with the assump- 
tion that it contains more than one observer. 

Because the theory gives us an objective description, it constitutes a 
framework in which a number of puzzling subjects (such as classical level 
phenomena, the measuring process itself, the inter-relationship of several 
observers, questions of reversibility and irreversibility, etc.) can be in- 
vestigated in detail in a logically consistent manner. It supplies a new 
way of viewing processes, which clarifies many apparent paradoxes of the 
usual interpretation 1 - indeed, it constitutes an objective framework in 
which it is possible to understand the general consistency of the ordinary 
view. 



1 Such as that of Einstein, Rosen, and Podolsky [8], as well as the paradox of 
the introduction. 

109 



110 



HUGH EVERETT, III 



We shall now resume our discussion of alternative interpretations. 
There has been expressed lately a great deal of dissatisfaction with the 
present form of quantum theory by a number of authors, and a wide variety 
of new interpretations have sprung into existence. We shall now attempt 
to classify briefly a number of these interpretations, and comment upon 
them. 

a. The "popular" interpretation. This is the scheme alluded to in 
the introduction, where ifr is regarded as objectively characteriz- 
ing the single system, obeying a deterministic wave equation when 
the system is isolated but changing probabilistically and discon- 
tinuously under observation. 

In its unrestricted form this view can lead to paradoxes like that men- 
tioned in the introduction, and is therefore untenable. However, this view 
is consistent so long as it is assumed that there is only one observer in 
the universe (the solipsist position — Alternative 1 of the Introduction). 
This consistency is most easily understood from the viewpoint of our own 
theory, where we were able to show that all phenomena will seem to follow 
the predictions of this scheme to any observer. Our theory therefore justi- 
fies the personal adoption of this probabilistic interpretation, for purposes 
of making practical predictions, from a more satisfactory framework. 

b. The Copenhagen interpretation. This is the interpretation developed 
by Bohr. The if/ function is not regarded as an objective descrip- 
tion of a physical system (i.e., it is in no sense a conceptual 
model), but is regarded as merely a mathematical artifice which 
enables one to make statistical predictions, albeit the best predic- 
tions which it is possible to make. This interpretation in fact 
denies the very possibility of a single conceptual model applicable 
to the quantum realm, and asserts that the totality of phenomena 
can only be understood by the use of different, mutually exclusive 
(i.e., "complementary") models in different situations. All state- 
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ments about microscopic phenomena are regarded as meaningless 
unless accompanied by a complete description (classical) of an 
experimental arrangement. 

While undoubtedly safe from contradiction, due to its extreme conserva- 
tism, it is perhaps overcautious. We do not believe that the primary pur- 
pose of theoretical physics is to construct "safe" theories at severe cost 
in the applicability of their concepts, which is a sterile occupation, but 
to make useful models which serve for a time and are replaced as they are 
outworn. 2 

Another objectionable feature of this position is its strong reliance 
upon the classical level from the outset, which precludes any possibility 
of explaining this level on the basis of an underlying quantum theory. (The 
deduction of classical phenomena from quantum theory is impossible simply 
because no meaningful statements can be made without pre-existing classi- 
cal apparatus to serve as a reference frame.) This interpretation suffers 
from the dualism of adhering to a "reality" concept (i.e., the possibility 
of objective description) on the classical level but renouncing the same 
in the quantum domain. 

c. The "hidden variables" interpretation. This is the position 
(Alternative 4 of the Introduction) that ^ is not a complete de- 
scription of a single system. It is assumed that the correct com- 
plete description, which would involve further (hidden) parameters, 
would lead to a deterministic theory, from which the probabilistic 
aspects arise as a result of our ignorance of these extra parameters 
in the same manner as in classical statistical mechanics. 



Cf. Appendix II. 
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The ^-function is therefore regarded as a description of an ensemble 
of systems rather than a single system. Proponents of this interpretation 
include Einstein, Bohm, Wiener and Siegal. 

Einstein hopes that a theory along the lines of his general relativity, 
where all of physics is reduced to the geometry of space-time could satis- 
factorily explain quantum effects. In such a theory a particle is no longer 
a simple object but possesses an enormous amount of structure (i.e., it is 
thought of as a region of space-time of high curvature). It is conceivable 
that the interactions of such "particles" would depend in a sensitive way 
upon the details of this structure, which would then play the role of the 
"hidden variables." 6 However, these theories are non-linear and it is 
enormously difficult to obtain any conclusive results. Nevertheless, the 
possibility cannot be discounted. 

Bohm considers tfr to be a real force field acting on a particle which 
always has a well-defined position and momentum (which are the hidden 
variables of this theory). The ^-field satisfying Schrodinger's equation 
is pictured as somewhat analogous to the electromagnetic field satisfying 
Maxwell's equations, although for systems of n particles the ^-field is 
in a 3n-dimensional space. With this theory Bohm succeeds in showing 
that in all actual cases of measurement the best predictions that can be 
made are those of the usual theory, so that no experiments could ever rule 
out his interpretation in favor of the ordinary theory. Our main criticism 
of this view is on the grounds of simplicity — if one desires to hold the 
view that ^ is a real field then the associated particle is superfluous 
since, as we have endeavored to illustrate, the pure wave theory is itself 
satisfactory. 



Einstein [7]. 
Bohm [2]. 

Wiener and Siegal [20]. 

For an example of this type of theory see Einstein and Rosen [9]. 
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Wiener and Siegal have developed a theory which is more closely tied 
to the formalism of quantum mechanics. From the set N of all non- 
degenerate linear Hermitian operators for a system having a complete set 
of eigenstates, a subset I is chosen such that no two members of I com- 
mute and every element outside I commutes with at least one element of 
I. The set I therefore contains precisely one operator for every orienta- 
tion of the principal axes of the Hilbert space for the system. It is postu- 
lated that each of the operators of I corresponds to an independent ob- 
servable which can take any of the real numerical values of the spectrum 
of the operator. This theory, in its present form, is a theory of infinitely 
many "hidden variables," since a system is pictured as possessing (at 
each instant) a value for every one of these "observables" simultaneously, 
with the changes in these values obeying precise (deterministic) dynamical 
laws. However, the change of any one of these variables with time depends 
upon the entire set of observables, so that it is impossible ever to discover 
by measurement the complete set of values for a system (since only one 
"observable" at a time can be observed). Therefore, statistical ensembles 
are introduced, in which the values of all of the observables are related to 
points in a "differential space," which is a Hilbert space containing a 
measure for which each (differential space) coordinate has an independent 
normal distribution. It is then shown that the resulting statistical dynamics 
is in accord with the usual form of quantum theory. 

It cannot be disputed that these theories are often appealing, and might 
conceivably become important should future discoveries indicate serious 
inadequacies in the present scheme (i.e., they might be more easily modi- 
fied to encompass new experience). But from our viewpoint they are 
usually more cumbersome than the conceptually simpler theory based on 
pure wave mechanics. Nevertheless, these theories are of great theoretical 
importance because they provide us with examples that "hidden variables" 
theories are indeed possible. 



A non-denumerable infinity, in fact, since the set I is uncountable! 
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d. The stochastic process interpretation. This is the point of view 
which holds that the fundamental processes of nature are stochas- 
tic (i.e., probabilistic) processes. According to this picture 
physical systems are supposed to exist at all times in definite 
states, but the states are continually undergoing probabilistic 
changes. The discontinuous probabilistic "quantum-jumps" are 
not associated with acts of observation, but are fundamental to the 
systems themselves. 

A stochastic theory which emphasizes the particle, rather than wave, 
aspects of quantum theory has been investigated by Bopp. The particles 
do not obey deterministic laws of motion, but rather probabilistic laws, 
and by developing a general "correlation statistics" Bopp shows that his 
quantum scheme is a special case which gives results in accord with the 
usual theory. (This accord is only approximate and in principle one could 
decide between the theories. The approximation is so close, however, 
that it is hardly conceivable that a decision would be practically feasible.) 

Bopp's theory seems to stem from a desire to have a theory founded 
upon particles rather than waves, since it is this particle aspect (highly 
localized phenomena) which is most frequently encountered in present day 
high-energy experiments (cloud chamber tracks, etc.). However, it seems 
to us to be much easier to understand particle aspects from a wave picture 
(concentrated wave packets) than it is to understand wave aspects (diffrac- 
tion, interference, etc.) from a particle picture. 

Nevertheless, there can be no fundamental objection to the idea of a 
stochastic theory, except on grounds of a naked prejudice for determinism. 
The question of determinism or indeterminism in nature is obviously for- 
ever undecidable in physics, since for any current deterministic [proba- 
bilistic] theory one could always postulate that a refinement of the theory 



Bopp [5]. 
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would disclose a probabilistic [deterministic] substructure, and that the 
current deterministic [probabilistic] theory is to be explained in terms of 
the refined theory on the basis of the law of large numbers [ignorance of 
hidden variables]. However, it is quite another matter to object to a mix- 
ture of the two where the probabilistic processes occur only with acts of 
observation. 

e. The wave interpretation. This is the position proposed in the 
present thesis, in which the wave function itself is held to be the 
fundamental entity, obeying at all times a deterministic wave 
equation. 

This view also corresponds most closely with that held by Schrodinger. 
However, this picture only makes sense when observation processes them- 
selves are treated within the theory. It is only in this manner that the 
apparent existence of definite macroscopic objects, as well as localized 
phenomena, such as tracks in cloud chambers, can be satisfactorily ex- 
plained in a wave theory where the waves are continually diffusing. With 
the deduction in this theory that phenomena will appear to observers to be 
subject to Process 1, Heisenberg's criticism 10 of Schrbdinger's opinion - 
that continuous wave mechanics could not seem to explain the discontinui- 
ties which are everywhere observed — is effectively met. The "quantum- 
jumps" exist in our theory as relative phenomena (i.e., the states of an 
object-system relative to chosen observer states show this effect), while 
the absolute states change quite continuously. 

The wave theory is definitely tenable and forms, we believe, the 
simplest complete, self-consistent theory. 



Schrodinger [l8]. 
Heisenberg [14]. 
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We should like now to comment on some views expressed by Einstein. 
Einstein's 11 criticism of quantum theory (which is actually directed more 
against what we have called the "popular" view than Bohr's interpreta- 
tion) is mainly concerned with the drastic changes of state brought about 
by simple acts of observation (i.e., the infinitely rapid collapse of wave 
functions), particularly in connection with correlated systems which are 
widely separated so as to be mechanically uncoupled at the time of obser- 
vation. 12 At another time he put his feeling colorfully by stating that he 

could not believe that a mouse could bring about drastic changes in the 

1 3 

universe simply by looking at it. 

However, from the standpoint of our theory, it is not so much the sys- 
tem which is affected by an observation as the observer, who becomes 
correlated to the system. 

In the case of observation of one system of a pair of spatially sepa- 
rated, correlated systems, nothing happens to the remote system to make 
any of its states more "real" than the rest. It had no independent states 
to begin with, but a number of states occurring in a superposition with 
corresponding states for the other (near) system. Observation of the near 
system simply correlates the observer to this system, a purely local pro- 
cess — but a process which also entails automatic correlation with the 
remote system. Each state of the remote system still exists with the same 
amplitude in a superposition, but now a superposition for which element 
contains, in addition to a remote system state and correlated near system 
state, an observer state which describes an observer who perceives the 
state of the near system. 14 From the present viewpoint all elements of 



Einstein [7l. 

For example, the paradox of Einstein, Rosen, and Podolsky [8]. 

Address delivered at Palmer Physical Laboratory, Princeton, Spring, 1954. 

See in this connection Chapter IV, particularly pp. 82, 83. 
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this superposition are equally "real." Only the observer state has 
changed, so as to become correlated with the state of the near system and 
hence naturally with that of the remote system also. The mouse does not 
affect the universe — only the mouse is affected. 

Our theory in a certain sense bridges the positions of Einstein and 
Bohr, since the complete theory is quite objective and deterministic ("God 
does not play dice with the universe"), and yet on the subjective level, 
of assertions relative to observer states, it is probabilistic in the strong 
sense that there is no way for observers to make any predictions better 
than the limitations imposed by the uncertainty principle. 15 

In conclusion, we have seen that if we wish to adhere to objective 
descriptions then the principle of the psycho-physical parallelism requires 
that we should be able to consider some mechanical devices as represent- 
ing observers. The situation is then that such devices must either cause 
the probabilistic discontinuities of Process 1, or must be transformed into 
the superpositions we have discussed. We are forced to abandon the for- 
mer possibility since it leads to the situation that some physical systems 
would obey different laws from the rest, with no clear means for distin- 
guishing between these two types of systems. We are thus led to our 
present theory which results from the complete abandonment of Process 1 
as a basic process. Nevertheless, within the context of this theory, 
which is objectively deterministic, it develops that the probabilistic 
aspects of Process 1 reappear at the subjective level, as relative phenom- 
ena to observers. 

One is thus free to build a conceptual model of the universe, which 
postulates only the existence of a universal wave function which obeys a 
linear wave equation. One then investigates the internal correlations in 
this wave function with the aim of deducing laws of physics, which are 



15 Cf. Chapter V, §2. 
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statements that take the form: Under the conditions C the property A 
of a subsystem of the universe (subset of the total collection of coordi- 
nates for the wave function) is correlated with the property B of another 
subsystem (with the manner of correlation being specified). For example, 
the classical mechanics of a system of massive particles becomes a law 
which expresses the correlation between the positions and momenta 
(approximate) of the particles at one time with those at another time. 16 
All statements about subsystems then become relative statements, i.e., 
statements about the subsystem relative to a prescribed state for the re- 
mainder (since this is generally the only way a subsystem even possesses 
a unique state), and all laws are correlation laws. 

The theory based on pure wave mechanics is a conceptually simple 
causal theory, which fully maintains the principle of the psycho-physical 
parallelism. It therefore forms a framework in which it is possible to dis- 
cuss (in addition to ordinary phenomena) observation processes them- 
selves, including the inter-relationships of several observers, in a logical, 
unambiguous fashion. In addition, all of the correlation paradoxes, like 
that of Einstein, Rosen, and Podolsky, 17 find easy explanation. 

While our theory justifies the personal use of the probabilistic inter- 
pretation as an aid to making practical predictions, it forms a broader 
frame in which to understand the consistency of that interpretation. It 
transcends the probabilistic theory, however, in its ability to deal logi- 
cally with questions of imperfect observation and approximate measurement. 

Since this viewpoint will be applicable to all forms of quantum mechan- 
ics which maintain the superposition principle, it may prove a fruitful 
framework for the interpretation of new quantum formalisms. Field theories, 
particularly any which might be relativistic in the sense of general rela- 



16 Cf. Chapter V, §2. 

Einstein, Rosen, and Podolsky M. 
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tivity, might benefit from this position, since one is free to construct 
formal (non-probabilistic) theories, and supply any possible statistical 
interpretations later. (This viewpoint avoids the necessity of considering 
anomalous probabilistic jumps scattered about space-time, and one can 
assert that field equations are satisfied everywhere and everywhen, then 
deduce any statistical assertions by the present method.) 

By focusing attention upon questions of correlations, one may be able 
to deduce useful relations (correlation laws analogous to those of classi- 
cal mechanics) for theories which at present do not possess known classi- 
cal counterparts. Quantized fields do not generally possess pointwise 
independent field values, the values at one point of space-time being 
correlated with those at neighboring points of space-time in a manner, it 
is to be expected, approximating the behavior of their classical counter- 
parts. If correlations are important in systems with only a finite number 
of degrees of freedom, how much more important they must be for systems 
of infinitely many coordinates. 

Finally, aside from any possible practical advantages of the theory, 
it remains a matter of intellectual interest that the statistical assertions 
of the usual interpretation do not have the status of independent hypoth- 
eses, but are deducible (in the present sense) from the pure wave mechan- 
ics, which results from their omission. 



APPENDIX I 



We shall now supply the proofs of a number of assertions which have 
been made in the text. 

§1. Proof of Theorem 1 

We now show that |X,Y,...,Z{ > unless X,Y,...,Z are independent 
random variables. Abbreviate PCx^y^...^) by Pjj >-k ' and l et 



(1.1) 



r ii k 

if P.P....P, > 
P i P j-- P k 1 J k 



if PiPj...P k = 



(Note that PjPj...P k = implies that also P^^ = 0.) Then always 
d-2) Pij...k = Qij...k P i P j- P k > 



and we have 

(1.3) {X,Y,...,Z| = Exp 



In 



P i P j- P kj 



= Exp[ In Qij... k ] 



= 2 p i p j- p kV.k ln Qij...k • 

ij...k 

Applying the inequality for x ^ : 

(1.4) x In x > x - 1 (except for x = 1) 

(which is easily established by calculating the minimum of x In x — (x— 1)) 
to (1.3) we have: 
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(1.5) PiPj-Pk Qij... k ln Qij...k > PiPj-Pk^ij.-.k- 1 ) 

(unless Qij...k = !) • 
Therefore we have for the sum: 

(1.6) 2 P i P j ...P k Q ij ... k inQ ij ... k > 2 p i p j- p kQij...k- 2 p i p j- p k 

ij...k ij-.-k ij-..k 

unless all Qy.^-l. But 2 P i P j " P k Qij...k = S P ij...k = ^ and 

ij...k ij...k 
j£ P i Pj...P k =l, so that the right side of (1.6) vanishes. The left 

ij...k 

side is, by (1.3) the correlation iX,Y,...,Z}, and the condition that all of 
the Q|j k equal one is precisely the independence condition that 
Pjj k = P^Pj^.P^ for all i,j,...,k. We have therefore proved that 

(1.7) |X,Y,...,Z| > 

unless X,Y,...,Z are mutually independent. 

§2. Convex function inequalities 

We shall now establish some basic inequalities which follow from the 
convexity of the function x ln x. 

Lemma 1. Xi > 0, Pj ^ 0, ^ Pj = 1 

i 

=> (2 p i x i) ln (2 p i x i) = 2 p i x i ln x i • 

This property is usually taken as the definition of a convex function, 1 
but follows from the fact that the second derivative of x ln x is positive 
for all positive x, which is the elementary notion of convexity. There is 
also an immediate corollary for the continuous case: 



1 See Hardy, Littlewood, and Polya [13], p. 70. 
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Corollary 1. 



g(x) 



)^0, P(x)>0, | > P(x)dx = l 
£j* P(x)g(x)dxj In £j* P(x)g(x)dxj < J P(x)g(x)ln g(x)dj 

We can now derive a more general and very useful inequality from 
Lemma 1: 



Lemma 2. 



Xi >0, 3i >0 (all i) 



Proof : Let ^i'^/^^ so that Pj > and ]£Pj = l- Then by 
Lemma 1: 1 1 

«» [X'.©]"[2'.(#2'.©'-ft)- 

Substitution for Pj yields: 



which reduces to 
(2.3) 



and we have proved the lemma. 
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We also mention the analogous result for the continuous case: 



Corollary 2. 



f(x)^0, g(x)>0 (all x) 



=> 




dx . 



§3. Refinement theorems 

We now supply the proof for Theorems 2 and 4 of Chapter II, which 
concern the behavior of correlation and information upon refinement of the 
distributions. We suppose that the original (unrefined) distribution is 
P ij k = P ( x i' y j'-"' z k)' and that the re/ined distribution is p^fi'^j'-"'^ 
where the original value Xj for X has been resolved into a number of 
values x^ 1 , and similarly for Y,...,Z. Then: 



Computing the new correlation {X,Y,...,Zl' for the refined distribution 




etc. 



pWi'—n we find: 



ij...k 



(3.2) !X,Y Zi'= 2 2 



ij...k fi.,^,...,^ 




However, by Lemma 2, §2: 



(3.3) 
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Substitution of (3.3) into (3.2), noting that 2 P^ i ,F^'' / j,...,P^ 7 ' k is 



equal to 



(3.4) 
|X,Y ( . 



2 



p. 



2 p ij...k ta piFCTr- lx ' Y zi ' 



ij...k 



i p j-Pk 



and we have completed the proof of Theorem 2 (Chapter II), which asserts 
that refinement never decreases the correlation. 

We now consider the effect of refinement upon the relative information. 
We shall use the previous notation, and further assume that a-^bj 1 ^,.". 



cjj'k are the information measures for which we wish to compute the rela- 
tive information of p^'^Q'"''^ and of Py^- The information mea- 
sures for the unrefined distribution Pjj ^ then satisfy the relations: 



(3.5) 



^ = 2^- b -2 b ?'- • 



The relative information of the refined distribution is 



(3.6) 



XY...Z = 2 2 p f k " k ln 



ij...k 



•'fib'."'* . 



and by exactly the same procedure as we have just used for the correla- 
tion we arrive at the result: 



Cf. Shannon [l9]. Appendix 7, where a quite similar theorem is proved. 
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( 3 -7) *XY...Z ^ 2 P ij-..k ln £ " J XY...Z ' 

i...k 1 J k 

and we have proved that refinement never decreases the relative informa- 
tion (Theorem 4, Chapter II). 

It is interesting to note that the relation (3.4) for the behavior of 
correlation under refinement can be deduced from the behavior of relative 
information, (3.7). This deduction is an immediate consequence of the 
fact that the correlation is a relative information — the information of the 
joint distribution relative to the product measure of the marginal distribu- 
tions. 

§4. Monotone decrease of information for stochastic processes 

We consider a sequence of transition-probability matrices Tjj ( 3** f T?j = 

j 

1 for all n, i, and ^ T?j < 1 for all n, i, j), and a sequence of 
measures af (a? ^ 0) having the property that 

(4.D a?* 1 = 2 a?T?. . 

i 

We further suppose that we have a sequence of probability distributions, 
Pf, such that 

(4.2) P? +1 = 2 PfTj . 

i 

For each of these probability distributions the relative information 
I n (relative to the a? measure) is defined: 

(4.3) I" = 2 P i 1 ln 

i 

Under these circumstances we have the following theorem: 




Theorem. 



r n+l < jn 
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Proof: Expanding I n+1 we get: 

However, by Lemma 2 (§2, Appendix I) we have the inequality 



(4.5) 



(2 P i T ij) P?TP. 
/ V P n T?.\ In - < V P n T n . In 1 1} . 



Substitution of (4.5) into (4.4) yields 



(4.6) 



I- 1 < 2 (2 ^ in ^ - J Pf in ^ 

= 2 p ? ln (f)= in - 



and the proof is completed. 



This proof can be successively specialized to the case where T is 
stationary (T?j = Tjj for all n) and then to the case where T is 
doubly-stochastic ( 2^ij = * * or a ^ ^ ^' 



COROLLARY 1. T-j is stationary (T-j = Ty, all n), and the measure 
aj is a stationary measure (a- = ^ a i^ij^» im P^Y that the information, 

I n = ^ p i ln (Pf/af), '"s monotone decreasing. (As before, P n+1 = 
i 

2p?T? r ) 



Proof: Immediate consequence of preceding theorem. 
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COROLLARY 2. Tjj is doubly-stochastic (^£ T ij =1 » ail j) «np/ies 

i 

that the information relative to the uniform measure (aj = 1, all i), I n = 
P n In P n , is monotone decreasing. 

i 

Proof : For aj = 1 (all i) we have that ^3^= ^ T ij = 1 = a j- 

i i 

Therefore the uniform measure is stationary in this case and the result 
follows from Corollary 1. 

These results hold for the continuous case also, and may be easily 
verified by replacing the above summations by integrations, and by re- 
placing Lemma 2 by its corollary. 

§5. Proof of special inequality for Chapter IV (1.7) 

LEMMA. Given probability densities P(r), Pj(x), P 2 (r), with P(r) = 

J P x (x)P 2 (r-xr)dx. Then I R % I x - lnr, where I x = J P x (x) In P x (x)dx 

and I R = J P(r) In P(r)dr. 

Proof : We first note that: 

(5.1) j P 2 (r-xr)dx = J P 2 (co) ^ = \ (all r) 



J" P 2 (r-xr)dr = J P 2 (&))d<u = 1 (all x) . 



and that furthermore 
(5.2) 

We now define the density P r (x): 

(5.3) P r (x) = rP 2 (r-xr) , 

which is normalized, by (5.1). Then, according to §2, Corollary 1 Appen- 
dix I), we have the relation: 
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(5.4)^ P r (x)P 1 (x)dx^ In P t (x)P 1 (x)dx^ < J P r (x)P 1 (x)dx . 



Substitution from (5.3) gives 



(5.5) J P^r-xOP^xJdx^ In (r J P 2 (r-xr)P 1 (x)dx^ 

< r J P 2 (t-xr)P 1 (x) In P x (x)dx . 

The relation P(r) = J P 1 (x)P 2 (t-xr)dx, together with (5.5) then implie 

(5.6) P(r) In rP(r) < j P 2 (r-xr)P 1 (x) In P t (x)dx , 



which is the same as 
(5 



.7) P(r) In P(r) ^ J P 2 (r-xr)P 1 (x) In P x (x)dx - P(r) lnr . 

Integrating with respect to r, and interchanging the order of integration 
on the right side gives: 

(5.8) I R = P(r) lnP(r)dr < JJJ P 2 (r-xr)dr 

- (lnr) J P(r)dr . 

J P(r)dr: 

(5.9) I R < J P x (x) In P 1 (x)dx - In; 



P x (x) In Pj(x)dx 



But using (5.2) and the fact that P(r)dr=l this means that 

it = Ijj — lnr , 
and the proof of the lemma is completed. 



§6. Stationary point of Ij^ + Ijj 

We shall show that the information sum: 



I K +I X = J <f>*(f>(X) ln^V(k)dk + J 

0(k) = (1/v^) | e" ikx ^(x)dx 



(6.1) I K + Ix= I <f>*<f>00 ln0VCk)dk+ | ^fy(x) In ^Vd)** . 
where 
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is stationary for the functions: 

l_ l_ 

(6.2) ^ (x) = (l/2;r^) 4 e- x2/4<j2 x , o (k) = (2a 2 A) 4 e^^x , 

with respect to variations of iff, 8ij/, which preserve the normalization 

(6.3) J S(^V)dx = . 

— oo 

The variation dtp gives rise to a variation 8<f> of $(k) : 

/oo 
e~ ikx S^dx . 

— oo 

To avoid duplication of effort we first calculate the variation 8 for i 
arbitrary wave function u(f ). By definition, 

(6.5) l£ = J u*(f ) u(f ) In u*(f ) u(f ) df , 



so that 



(6.6) Sl € = f S(ln U * U) + S(U * U) ^ ^ 

— OO 

= J (1 + In u*u) (u*8u uSu*)df . 

— oo 

We now suppose that u has the real form: 

(6.7) u(f ) = a e-^ 2 = u*(f ) , 
and from (6.6) we get 

oo 

(6.8) 8 1£ = I (1 + In a 2 -2bf 2 )ae _b ^ (8u)d£ + complex conjugate. 
We now compute 8 I K for <£ using (6.8), (6.2), and (6.4): 

oo oo 

(6.9) SIk- = f (1 + In a' 2 - 2b'k 2 )ae -b ' k2 -±= f e -ikx 5^dxdk + c.c. , 
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where 



a = (2a 2 ./*) 4 , b' = a 2 . . 



Interchanging the order of integration and performing the definite integra- 
tion over k we get: 



(6.10) 8I K ^ -J (in + e-( X 2 /4K)^ (x)dx 



+ c.c. , 



while application of (6.8) to ft gives 



(6.11) 5I X 



OO 

= f (1 4 In 
^0 J 



a' 2 -2b"x 2 )a'e- irx S^(x)dx + c.c. , 



where 



a" = (l/2nop . V = (l/4o£) . 
Adding (6.10) and (6.11), and substituting for a', b", a", b", yields: 

(6.12) fia K +I x >|^ =( 1 ~ lnff ) J* O^o 2 .) 4 e ~ * M x S^(x)dx+ c.c. . 

But the integrand of (6.12) is simply ^ (x)8^(x), so that 

(6.13) S $K +l x) =Q-~ 1 *") I ^ S ^ dx + c - c - • 

Since ^ Q is real, ^ Q S^ + c.c. = ^qS^ + c.c. = ^qS^ + ^ Q 5^* = S(^V). 
so that 



(6.14) 



S(I K+ I X )| 



=(l-ln ff )f 



S(^V)dx = , 



due to the normality restriction (6.3), and the proof is completed. 



APPENDIX II 



REMARKS ON THE ROLE OF THEORETICAL PHYSICS 

There have been lately a number of new interpretations of quantum 
mechanics, most of which are equivalent in the sense that they predict the 
same results for all physical experiments. Since there is therefore no hope 
of deciding among them on the basis of physical experiments, we must turn 
elsewhere, and inquire into the fundamental question of the nature and pur- 
pose of physical theories in general. Only after we have investigated and 
come to some sort of agreement upon these general questions, i.e., of the 
role of theories themselves, will we be able to put these alternative inter- 
pretations in their proper perspective. 

Every theory can be divided into two separate parts, the formal part, 
and the interpretive part. The formal part consists of a purely logico- 
mathematical structure, i.e., a collection of symbols together with rules 
for their manipulation, while the interpretive part consists of a set of 
"associations," which are rules which put some of the elements of the 
formal part into correspondence with the perceived world. The essential 
point of a theory, then, is that it is a mathematical model, together with 
an isomorphism 1 between the model and the world of experience (i.e., the 
sense perceptions of the individual, or the "real world" — depending upon 
one's choice of epistemology). 



By isomorphism we mean a mapping of some elements of the model into ele- 
ments of the perceived world which has the property that the model is faithful, 
that is, if in the model a symbol A implies a symbol B, and A corresponds 
to the happening of an event in the perceived world, then the event corresponding 
to B must also obtain. The word homomorphism would be technically more 
correct, since there may not be a one-one correspondence between the model and 
the external world. 
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The model nature is quite apparent in the newest theories, as in nuclear 
physics, and particularly in those fields outside of physics proper, such 
as the Theory of Games, various economic models, etc., where the degree 
of applicability of the models is still a matter of considerable doubt. How- 
ever, when a theory is highly successful and becomes firmly established, 
the model tends to become identified with "reality" itself, and the model 
nature of the theory becomes obscured. The rise of classical physics 
offers an excellent example of this process. The constructs of classical 
physics are just as much fictions of our own minds as those of any other 
theory we simply have a great deal more confidence in them. It must be 
deemed a mistake, therefore, to attribute any more "reality" here than 
elsewhere. 

Once we have granted that any physical theory is essentially only a 
model for the world of experience, we must renounce all hope of finding 
anything like "the correct theory." There is nothing which prevents any 
number of quite distinct models from being in correspondence with experi- 
ence (i.e., all "correct"), and furthermore no way of ever verifying that 
any model is completely correct, simply because the totality of all experi- 
ence is never accessible to us. 

Two types of prediction can be distinguished; the prediction of pheno- 
mena already understood, in which the theory plays simply the role of a 
device for compactly summarizing known results (the aspect of most 
interest to the engineer), and the prediction of new phenomena and effects, 
unsuspected before the formulation of the theory. Our experience has 
shown that a theory often transcends the restricted field in which it was 
formulated. It is this phenomenon (which might be called the "inertia" 
of theories) which is of most interest to the theoretical physicist, and 
supplies a greater motive to theory construction than that of aiding the 
engineer. 

From the viewpoint of the first type of prediction we would say that 
the "best" theory is the one from which the most accurate predictions 
can be most easily deduced — two not necessarily compatible ideals. 
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Classical physics, for example, permits deductions with far greater ease 
than the more accurate theories of relativity and quantum mechanics, and 
in such a case we must retain them all. It would be the worst sort of 
folly to advocate that the study of classical physics be completely dropped 
in favor of the newer theories. It can even happen that several quite dis- 
tinct models can exist which are completely equivalent in their predictions, 
such that different ones are most applicable in different cases, a situation 
which seems to be realized in quantum mechanics today. It would seem 
foolish to attempt to reject all but one in such a situation, where it might 
be profitable to retain them all. 

Nevertheless, we have a strong desire to construct a single all- 
embracing theory which would be applicable to the entire universe. From 
what stems this desire? The answer lies in the second type of prediction 
— the discovery of new phenomena — and involves the consideration of 
inductive inference and the factors which influence our confidence in a 
given theory (to be applicable outside of the field of its formulation). This 
is a difficult subject, and one which is only beginning to be studied seri- 
ously. Certain main points are clear, however, for example, that our con- 
fidence increases with the number of successes of a theory. If a new 
theory replaces several older theories which deal with separate phenomena, 
i.e., a comprehensive theory of the previously diverse fields, then our 
confidence in the new theory is very much greater than the confidence in 
either of the older theories, since the range of success of the new theory 
is much greater than any of the older ones. It is therefore this factor of 
confidence which seems to be at the root of the desire for comprehensive 
theories. 

A closely related criterion is simplicity — by which we refer to con- 
ceptual simplicity rather than ease in use, which is of paramount interest 
to the engineer. A good example of the distinction is the theory of general 
relativity which is conceptually quite simple, while enormously cumber- 
some in actual calculations. Conceptual simplicity, like comprehensive- 
ness, has the property of increasing confidence in a theory. A theory 
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containing many ad hoc constants and restrictions, or many independent 
hypotheses, in no way impresses us as much as one which is largely free 
of arbitrariness. 

It is necessary to say a few words about a view which is sometimes 
expressed, the idea that a physical theory should contain no elements 
which do not correspond directly to observables. This position seems to 
be founded on the notion that the only purpose of a theory is to serve as 
a summary of known data, and overlooks the second major purpose, the 
discovery of totally new phenomena. The major motivation of this view- 
point appears to be the desire to construct perfectly "safe" theories 
which will never be open to contradiction. Strict adherence to such a 
philosophy would probably seriously stifle the progress of physics. 

The critical examination of just what quantities are observable in a 
theory does, however, play a useful role, since it gives an insight into 
ways of modification of a theory when it becomes necessary. A good ex- 
ample of this process is the development of Special Relativity. Such 
successes of the positivist viewpoint, when used merely as a tool for de- 
ciding which modifications of a theory are possible, in no way justify its 
universal adoption as a general principle which all theories must satisfy. 

In summary, a physical theory is a logical construct (model), consist- 
ing of symbols and rules for their manipulation, some of whose elements 
are associated with elements of the perceived world. The fundamental 
requirements of a theory are logical consistency and correctness. There 
is no reason why there cannot be any number of different theories satisfy- 
ing these requirements, and further criteria such as usefulness, simplicity, 
comprehensiveness, pictorability, etc., must be resorted to in such cases 
to further restrict the number. Even so, it may be impossible to give a 
total ordering of the theories according to "goodness," since different 
ones may rate highest according to the different criteria, and it may be 
most advantageous to retain more than one. 

As a final note, we might comment upon the concept of causality. It 
should be clearly recognized that causality is a property of a model, and 
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not a property of the world of experience. The concept of causality only 
makes sense with reference to a theory, in which there are logical depend- 
ences among the elements. A theory contains relations of the form "A 
implies B," which can be read as "A causes B," while our experi- 
ence, uninterpreted by any theory, gives nothing of the sort, but only a 
correlation between the event corresponding to B and that corresponding 
to A. 
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